Only call cudaGetDeviceProperties once. #135

corbett5 · 2021-06-01T21:50:49Z

While debugging I added some timers to encode3launch and got the following results for a single precision field of size 528x520x512 on device.

Previously

Time between entering encode3launch and cudaMemset: 0.617ms
Encode elapsed time: 0.00064 (s)
# encode3 rate: 817.93 (GB / sec)

100 calls to `zfp_compress`: 0.158 s
100 calls to `zfp_decompress`: 0.228 s

Now

Time between entering encode3launch and cudaMemset: 0.000ms
Encode elapsed time: 0.00064 (s)
# encode3 rate: 818.09 (GB / sec)

100 calls to `zfp_compress`: 0.077 s
100 calls to `zfp_decompress`: 0.166 s

I guess this would break things if in between calls to ZFP the user set the active device and the new active device had a different max grid size, but I doubt anyone is doing that.

lindstro · 2021-06-01T22:24:03Z

Might it make sense to query this information in zfp_stream_set_execution? It wouldn't directly address the issue you describe, but at least the user would then have a workaround.

codecov-commenter · 2021-06-01T23:43:39Z

Codecov Report

Merging #135 (2652ea1) into develop (fcae077) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop     #135   +/-   ##
========================================
  Coverage    94.84%   94.84%           
========================================
  Files          268      268           
  Lines        15216    15216           
========================================
  Hits         14431    14431           
  Misses         785      785

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fcae077...2652ea1. Read the comment docs.

corbett5 · 2021-06-02T05:49:36Z

Might it make sense to query this information in zfp_stream_set_execution? It wouldn't directly address the issue you describe, but at least the user would then have a workaround.

That makes sense to me, but you would have to add the max grid size to the zfp_stream struct. Also you would have to #include <cuda.h> in zpf.c and the way you have the build setup I think all the CUDA stuff is isolated from mainline zfp. Definitely a better solution but more invasive.

lindstro · 2021-06-02T06:02:17Z

True, though zfp.c includes template/cudacompress.c, which in turn includes cuda_zfp/cuZFP.h, so things aren't that isolated anyway. Maybe we could add some generic init() function for each backend that gets called by zfp_stream_set_execution()? Moving this info into zfp_stream is not great since this struct is used for other purposes as well (e.g., for the compressed-array classes), though I can't think of a better solution unless we add a common pointer from zfp_stream to execution specific parameters that is null for the default serial execution mode. Any other ideas would be welcome.

corbett5 · 2021-06-02T15:48:37Z

You're right we could put some init into cuZFP.h. But unless we put something into zfp_stream (could be a pointer) if the user wants to do device and then host compression (or in whatever order) they'll wind up calling cudaGetDeviceProperties each time which is what I'm trying to avoid. If it's associated with the stream then they could get around this by having one device stream and one host stream. I think that the current solution is pretty good though, I can't imagine a user will have a use case where they switch between GPU architectures at runtime.

lindstro · 2021-06-02T16:20:07Z

Agreed, something would have to be put into zfp_stream. The idea was to use a single pointer instead of adding a lot of baggage directly to the struct that often won't be used, and which would change the layout of the zfp_stream struct depending on CUDA availability.

I was thinking that cudaGetDeviceProperties would be called once per stream, not once per (de)compression call, which seems like a small penalty to pay. We can revisit this later, however.

lindstro · 2021-06-02T16:20:47Z

LGTM

Only call cudaGetDeviceProperties once.

2652ea1

lindstro merged commit 9704462 into LLNL:develop Jun 2, 2021

corbett5 deleted the corbett/cuda-device-properties branch June 2, 2021 23:24

lindstro mentioned this pull request Aug 28, 2022

zfp binary CUDA support broken? #178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only call cudaGetDeviceProperties once. #135

Only call cudaGetDeviceProperties once. #135

corbett5 commented Jun 1, 2021

lindstro commented Jun 1, 2021

codecov-commenter commented Jun 1, 2021

corbett5 commented Jun 2, 2021

lindstro commented Jun 2, 2021

corbett5 commented Jun 2, 2021 •

edited

Loading

lindstro commented Jun 2, 2021

lindstro commented Jun 2, 2021

Only call cudaGetDeviceProperties once. #135

Only call cudaGetDeviceProperties once. #135

Conversation

corbett5 commented Jun 1, 2021

lindstro commented Jun 1, 2021

codecov-commenter commented Jun 1, 2021

Codecov Report

corbett5 commented Jun 2, 2021

lindstro commented Jun 2, 2021

corbett5 commented Jun 2, 2021 • edited Loading

lindstro commented Jun 2, 2021

lindstro commented Jun 2, 2021

corbett5 commented Jun 2, 2021 •

edited

Loading