Optimize CUDA GPU #649

jean-m-cyr · 2018-01-28T14:21:11Z

etheminer does not support any CUDA architecture
that requires the obsolete shuffle without sync.
Copying mix to shared mem is expensive, do it
only for actual solutions.
Include whole work package in solution. Works out
to about the same as including all of its components
individually. Shortens construction of solution.
redefine, resize, and realign results buffer such that
cuda can use shift instead of multiply. As well as
sequential addresses for mix copy.
Delete unused cuda files, previously needed for
pre-shuffle architectures.

- etheminer does not support any CUDA version that requires the obsolete shuffle without sync. - Copying mix to shared mem is expensive, do it only for actual solutions.

jean-m-cyr · 2018-01-28T16:09:57Z

libethash-cuda/dagger_shuffled.cuh

@@ -114,11 +92,15 @@ __device__ __forceinline__ uint64_t compute_hash(
 			}
 		}
 	}
+


This is the actual optimization. Above is just cleanup

chfast · 2018-01-28T21:08:27Z

libethash-cuda/dagger_shuffled.cuh

@@ -86,16 +74,6 @@ __device__ __forceinline__ uint64_t compute_hash(
 			uint32_t thread_mix = fnv_reduce(mix[p]);

 			// update mix accross threads
-#if CUDA_VERSION < SHUFFLE_DEPRECATED


This is no longer needed?

Did I miss one?

Allows CUDA to use shifts instead of multiplies and sequential access of the mix. Assume cuda arch >= 3 and cuda toolkit >= 9 and remove deprecated code and definitions.

jean-m-cyr · 2018-01-29T00:19:13Z

Also deleted 2 unused CUDA files

jean-m-cyr · 2018-01-29T03:43:02Z

@chfast Should I squash all of this? It might actually be easier to inspect one commit at a time?

jean-m-cyr · 2018-01-29T04:49:44Z

I wonder if we still need to support sm 30 & 35 Kepler architecture? Other than the Tesla K40 & K80 those were all 2GB or less GPUs. I think the K40/80 have long since become unprofitable. Even Maxwell architecture is barely profitable and it's a later technology.

chfast · 2018-01-29T10:08:18Z

Isn't the K40 & K80 the ones some Macs have?

chfast · 2018-01-29T10:09:59Z

This is quite a lot of changes, let's better keep all the commits.

chfast

I'd like to see another review.

MariusVanDerWijden · 2018-01-29T10:30:44Z

I think we should still support 3.0 and 3.5 since those are still pretty common in big datacenters

AaronOpfer · 2018-01-29T13:26:59Z

libethcore/Farm.h

@@ -338,10 +338,10 @@ class Farm: public FarmFace
 	 * @param _wp The WorkPackage that the Solution is for.
 	 * @return true iff the solution was good (implying that mining should be .


This comment is no longer accurate.

Ah yes, thank you.

Optimize CUDA GPU

4a81752

- etheminer does not support any CUDA version that requires the obsolete shuffle without sync. - Copying mix to shared mem is expensive, do it only for actual solutions.

jean-m-cyr requested review from chfast, smurfy and MariusVanDerWijden January 28, 2018 16:06

jean-m-cyr commented Jan 28, 2018

View reviewed changes

Remove un-needed assignment

1a6a24b

chfast reviewed Jan 28, 2018

View reviewed changes

Consolidate mix and size to powers of 2

e4b5d81

Allows CUDA to use shifts instead of multiplies and sequential access of the mix. Assume cuda arch >= 3 and cuda toolkit >= 9 and remove deprecated code and definitions.

jean-m-cyr force-pushed the cuda-opt branch from 2f6d952 to e4b5d81 Compare January 28, 2018 22:21

Pass whole work package instead of 5 individual components

d79772d

chfast approved these changes Jan 29, 2018

View reviewed changes

jean-m-cyr merged commit a2f014e into master Jan 29, 2018

jean-m-cyr deleted the cuda-opt branch January 29, 2018 13:10

AaronOpfer reviewed Jan 29, 2018

View reviewed changes

Sail86 mentioned this pull request May 10, 2018

Add changes to CHANGELOG file #1088

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize CUDA GPU #649

Optimize CUDA GPU #649

jean-m-cyr commented Jan 28, 2018 •

edited

Loading

jean-m-cyr Jan 28, 2018

chfast Jan 28, 2018

jean-m-cyr Jan 28, 2018

jean-m-cyr commented Jan 29, 2018

jean-m-cyr commented Jan 29, 2018

jean-m-cyr commented Jan 29, 2018 •

edited

Loading

chfast commented Jan 29, 2018

chfast commented Jan 29, 2018

chfast left a comment

MariusVanDerWijden commented Jan 29, 2018

AaronOpfer Jan 29, 2018

jean-m-cyr Jan 29, 2018

@@ @@ -114,11 +92,15 @@ __device__ __forceinline__ uint64_t compute_hash( @@
               			}
               		}
               	}

		@@ -338,10 +338,10 @@ class Farm: public FarmFace
		* @param _wp The WorkPackage that the Solution is for.
		* @return true iff the solution was good (implying that mining should be .

Optimize CUDA GPU #649

Optimize CUDA GPU #649

Conversation

jean-m-cyr commented Jan 28, 2018 • edited Loading

jean-m-cyr Jan 28, 2018

Choose a reason for hiding this comment

chfast Jan 28, 2018

Choose a reason for hiding this comment

jean-m-cyr Jan 28, 2018

Choose a reason for hiding this comment

jean-m-cyr commented Jan 29, 2018

jean-m-cyr commented Jan 29, 2018

jean-m-cyr commented Jan 29, 2018 • edited Loading

chfast commented Jan 29, 2018

chfast commented Jan 29, 2018

chfast left a comment

Choose a reason for hiding this comment

MariusVanDerWijden commented Jan 29, 2018

AaronOpfer Jan 29, 2018

Choose a reason for hiding this comment

jean-m-cyr Jan 29, 2018

Choose a reason for hiding this comment

jean-m-cyr commented Jan 28, 2018 •

edited

Loading

jean-m-cyr commented Jan 29, 2018 •

edited

Loading