Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. #12177

Merged
merged 5 commits into from
Mar 11, 2025

Conversation

IMbackK
Copy link
Collaborator

@IMbackK IMbackK commented Mar 4, 2025

This refactors mmqv to unify the handling of parameters between host and device side code, avoideing duplication in calculateing nwarps and rows_per_cuda_block. Also explicitly handles wave_size != 32, for the minor benefit of getting us out of shared memory into warp level primitives one iteration earlier.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 4, 2025
…s in device code, even though that should not be a problem.
Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use an enum instead of an int to determine the table.

@IMbackK IMbackK requested a review from JohannesGaessler March 6, 2025 21:48
Co-authored-by: Johannes Gäßler <[email protected]>
@IMbackK IMbackK merged commit 10f2e81 into ggml-org:master Mar 11, 2025
47 checks passed
ishaangandhi pushed a commit to ishaangandhi/llama.cpp that referenced this pull request Mar 12, 2025
…per block between host and device code. (ggml-org#12177)

refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.

---------

Co-authored-by: Johannes Gäßler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants