CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. #12177

IMbackK · 2025-03-04T12:31:08Z

This refactors mmqv to unify the handling of parameters between host and device side code, avoideing duplication in calculateing nwarps and rows_per_cuda_block. Also explicitly handles wave_size != 32, for the minor benefit of getting us out of shared memory into warp level primitives one iteration earlier.

…between host and device code.

…s in device code, even though that should not be a problem.

JohannesGaessler

Please use an enum instead of an int to determine the table.

ggml/src/ggml-cuda/mmvq.cu

Co-authored-by: Johannes Gäßler <[email protected]>

…per block between host and device code. (ggml-org#12177) refactor mmqv to unify the calculation of nwarps and rows per block between host and device code. --------- Co-authored-by: Johannes Gäßler <[email protected]>

IMbackK requested a review from JohannesGaessler as a code owner March 4, 2025 12:31

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 4, 2025

refractor mmqv to unify the calculation of nwarps and rows per block …

888ffc8

…between host and device code.

IMbackK force-pushed the refactor_mmqv branch from 50d4277 to 888ffc8 Compare March 4, 2025 12:32

make cuda happy, as it dosent support calling host constexpr function…

a55d765

…s in device code, even though that should not be a problem.

JohannesGaessler reviewed Mar 6, 2025

View reviewed changes

IMbackK added 2 commits March 6, 2025 22:40

Fix nits

b85a723

Fix spelling of parameter

15f4dca

IMbackK requested a review from JohannesGaessler March 6, 2025 21:48

JohannesGaessler approved these changes Mar 7, 2025

View reviewed changes

ggml/src/ggml-cuda/mmvq.cu Outdated Show resolved Hide resolved

Update ggml/src/ggml-cuda/mmvq.cu

1b3894e

Co-authored-by: Johannes Gäßler <[email protected]>

IMbackK merged commit 10f2e81 into ggml-org:master Mar 11, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. #12177

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. #12177

IMbackK commented Mar 4, 2025

JohannesGaessler left a comment

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. #12177

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. #12177

Conversation

IMbackK commented Mar 4, 2025

JohannesGaessler left a comment

Choose a reason for hiding this comment