Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : fix quantized cpy op #12310

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

ggml : fix quantized cpy op #12310

wants to merge 5 commits into from

Conversation

ggerganov
Copy link
Member

ref #12253

This should fix CPY(Q8_0, Q8_0)

@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels Mar 10, 2025
@aviallon
Copy link
Contributor

I no longer have garbled output with quantized cache. Only repetitions when reaching context-size, depending on the batch-size and the number of slots.
Tested-by: Antoine Viallon <[email protected]>

@jukofyork
Copy link
Contributor

Is there any chance we could add the copy operations for BF16? Even just BF16 <--> F32 would be enough to test it for the KV-cache types.

@ggerganov
Copy link
Member Author

@jukofyork bc25236 should cover BF16 <-> F32 copies.

@slaren
Copy link
Member

slaren commented Mar 11, 2025

This change does not look right to me. If i00 and i10 represent blocks now, then the logic for determining when to move to the next row in if (++i10 == ne0) { i10 = 0; .. does not seem correct, since i10 is a block index, and ne0 is the number of elements. Renaming the variables so that it is clear if they are element of block indices should make the code easier to understand.

for (int64_t i01 = ir0; i01 < ir1; i01++) {
for (int64_t i00 = 0; i00 < nb; i00++) {
const char * src0_ptr = ((char *) src0->data + i00*nb00 + i01*nb01 + i02*nb02 + i03*nb03);
char * dst_ptr = ((char *) dst->data + i10*nb0 + i11*nb1 + i12*nb2 + i13*nb3);
memcpy(dst_ptr, src0_ptr, type_size);
if (++i10 == ne0) {
i10 = 0;
if (++i11 == ne1) {
i11 = 0;
if (++i12 == ne2) {
i12 = 0;
if (++i13 == ne3) {
i13 = 0;
}
}
}
}
}
}
i10 += nb * (ne01 - ir1);

@ggerganov
Copy link
Member Author

Good catch. This code wasn't exercised by the tests it is used when dst is non-contiguous. I added an option to permute the dst tensor for the test_cpy.

I used the nk00 to indicate number of blocks of src0 along dim 0 (i.e. along the row). Respectively, the counter is k00.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants