ggml : fix quantized cpy op #12310

ggerganov · 2025-03-10T13:48:43Z

This should fix CPY(Q8_0, Q8_0)

ggml-ci

aviallon · 2025-03-10T16:56:23Z

I no longer have garbled output with quantized cache. Only repetitions when reaching context-size, depending on the batch-size and the number of slots.
Tested-by: Antoine Viallon <[email protected]>

tests/test-backend-ops.cpp

ggml-ci

jukofyork · 2025-03-10T18:25:55Z

Is there any chance we could add the copy operations for BF16? Even just BF16 <--> F32 would be enough to test it for the KV-cache types.

ggml-ci

ggerganov · 2025-03-11T08:40:01Z

@jukofyork bc25236 should cover BF16 <-> F32 copies.

tests/test-backend-ops.cpp

ggml-ci

slaren · 2025-03-11T13:58:56Z

This change does not look right to me. If i00 and i10 represent blocks now, then the logic for determining when to move to the next row in if (++i10 == ne0) { i10 = 0; .. does not seem correct, since i10 is a block index, and ne0 is the number of elements. Renaming the variables so that it is clear if they are element of block indices should make the code easier to understand.

llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c

Lines 4166 to 4187 in 938c779

    
           for (int64_t i01 = ir0; i01 < ir1; i01++) { 
        
               for (int64_t i00 = 0; i00 < nb; i00++) { 
        
                   const char * src0_ptr = ((char *) src0->data + i00*nb00 + i01*nb01 + i02*nb02 + i03*nb03); 
        
                         char * dst_ptr  = ((char *)  dst->data + i10*nb0  + i11*nb1  + i12*nb2  + i13*nb3); 
        
                   memcpy(dst_ptr, src0_ptr, type_size); 
        
                   if (++i10 == ne0) { 
        
                       i10 = 0; 
        
                       if (++i11 == ne1) { 
        
                           i11 = 0; 
        
                           if (++i12 == ne2) { 
        
                               i12 = 0; 
        
                               if (++i13 == ne3) { 
        
                                   i13 = 0; 
        
                               } 
        
                           } 
        
                       } 
        
                   } 
        
               } 
        
           } 
        
           i10 += nb * (ne01 - ir1);

ggml-ci

ggerganov · 2025-03-11T15:24:42Z

Good catch. This code wasn't exercised by the tests it is used when dst is non-contiguous. I added an option to permute the dst tensor for the test_cpy.

I used the nk00 to indicate number of blocks of src0 along dim 0 (i.e. along the row). Respectively, the counter is k00.

ggml : fix quantized cpy op

b971d06

ggml-ci

ggerganov mentioned this pull request Mar 10, 2025

Eval bug: garbage output right after kv-cache defragmentation for CPU backend #12253

Closed

github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels Mar 10, 2025

slaren reviewed Mar 10, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

tests : add cpy tests for all types

a3e78dc

ggml-ci

tests : add BF16 copy tests

bc25236

ggml-ci

slaren reviewed Mar 11, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

ggerganov commented Mar 11, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

ggerganov commented Mar 11, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

tests : fix loop for same-type copy

938c779

ggml-ci

tests : add option to permute the dst tensor

3384f36

ggml-ci

ggerganov force-pushed the gg/cpu-fix-cpy-q branch from 5da8ae3 to 3384f36 Compare March 11, 2025 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : fix quantized cpy op #12310

ggml : fix quantized cpy op #12310

ggerganov commented Mar 10, 2025

aviallon commented Mar 10, 2025

jukofyork commented Mar 10, 2025

ggerganov commented Mar 11, 2025

slaren commented Mar 11, 2025

ggerganov commented Mar 11, 2025

ggml : fix quantized cpy op #12310

Are you sure you want to change the base?

ggml : fix quantized cpy op #12310

Conversation

ggerganov commented Mar 10, 2025

aviallon commented Mar 10, 2025

jukofyork commented Mar 10, 2025

ggerganov commented Mar 11, 2025

slaren commented Mar 11, 2025

ggerganov commented Mar 11, 2025