llama : second attempt to refactor vision API #11292

ngxson · 2025-01-18T19:58:53Z

Supersede #9687

Important

Please do NOT upload gguf produced via this PR on the internet. People don't know how to use it and they will complain, very annoying!

Then,

cmake --build build -j --target llama-vision
./build/bin/llama-vision -m ../models/llava-1.5-7b-hf/model.gguf --image ../models/bliss.png

# The image showcases a lush green field with a hill in the background. In the foreground, there is a large,
# bright, and vibrant green field with a Microsoft Windows XP desktop screen, possibly representing a
# screensaver, superimposed onto the scene. The field is expansive and covers most of

Goals of this PR:

Have the first version of public API for llama_vision
Support llava, mobilevlm, minicpm-v 2.6, smolVLM
See how API can adapt to use with encoder-decoder like llama 3.2 vision (so we can add it soon)
Add API to format the chat, equivalent to Processor class on HF library
See how quantizing affect the performance

Things that will be done in follow-up PRs:

Models with encoder-decoder arch like llama 3.2 vision
GPU support
Better image processing function: faster resize function, maybe even abstract out the image transformations and optimize it (example: if we run resize twice, better to detect that and only run it once)
Further clean up the mess in convert-hf-to-gguf python script

ngxson · 2025-01-19T22:54:41Z

Hi @ggerganov @slaren , I would like to ask for an early review from you before proceeding further.

What will be interesting to discuss here is the usage of the new API, as demo in the newly added llama-vision example. The idea is:

Call llama_vision_encode for each image (we don't support batching for now, to simplify the implementation)
Then, get the output embedding ggml_tensor and add it to llama_batch, then llama_decode it.

I'm already be able to make llava and mobilevlm working with llama-vision and convert_hf_to_gguf.py (for minicpm-v, I'm still struggling with it because the conversion is not straight-forward)

Things that are different from the initial discussion in #8010 :

I added a helper function llama_batch_get_one_from_tensor for creating the batch from a tensor, with appropriate n_past (for placing these tokens in the correct place in chat template), and seq_id for future usage in server.
llama_vision_patches actually contains slices of image, not patches, as explained in llava-uhd. The patches are actually produced in clip_image_build_graph by doing a ggml_conv_2d. I think I'll need to rename it to llama_vision_slices, but I actually prefer a more appropriate name like llama_vision_preprocessed_img since we do more than just slicing it (i.e. resize, padding, etc) - feel free to suggest if you have any ideas.

And things that are still messy and will need more works:

Naming, most functions are still prefixed by clip_ and I don't know if I should prefix everything with llama_vision_clip_ or not. Please let me know what's your preference.
Chat template support, we may need to introduce a new API that wraps the llama_chat_apply_template, much like how on transformers, they have Processor class that wraps around Tokenizer
Not sure how this API will be adapted for encoder-decoder arch like llama 3.2 vision. In theory, llama_vision_get_output_tensor should become a no-op, but judging from this implementation, it's still needed. @danbev do you have any ideas?

I would love to hear your opinions about this. Thank you!

ngxson · 2025-01-19T22:59:05Z

src/llama-vision.cpp

+    if (ctx.ctx_ggml) {
+        ggml_free(ctx.ctx_ggml);
+    }
+    ggml_init_params params = {
+        /*.mem_size   =*/ ggml_tensor_overhead(),
+        /*.mem_buffer =*/ NULL,
+        /*.no_alloc   =*/ true,
+    };
+    ctx.ctx_ggml = ggml_init(params);
+    ctx.output = ggml_dup_tensor(ctx.ctx_ggml, output_node);
+    ggml_backend_alloc_ctx_tensors_from_buft(ctx.ctx_ggml, ctx.model->buft);
+    ggml_backend_tensor_copy(output_node, ctx.output);


@slaren Not sure if there is a better way, but I'm using a hacky solution here.

Without a dedicated context (and ggml_backend_tensor_copy), the underlay buffer is realloc before the next llama_decode, rendering the data unusable.

If the vision part uses the same scheduler than the llama_context, that's unavoidable. You could pre-allocate the tensor in a different buffer to avoid the copy, but that's an optimization that can be done later.

If we have a separate encoder context for the clip model, the decoder context could reference tensors from it directly. They would be interpreted as inputs for the decoder.

slaren · 2025-01-20T00:55:47Z

llama_vision_patches actually contains slices of image, not patches, as explained in llava-uhd. The patches are actually produced in clip_image_build_graph by doing a ggml_conv_2d. I think I'll need to rename it to llama_vision_slices, but I actually prefer a more appropriate name like llama_vision_preprocessed_img since we do more than just slicing it (i.e. resize, padding, etc) - feel free to suggest if you have any ideas.

I am just wondering, is there any reason to expose the patches/slices to the user at all? Can the user do anything with the patches other than just immediately call llama_vision_encode and throw them away? If not, then maybe that could be hidden entirely from the user and llama_vision_encode could take directly an image.

danbev · 2025-01-20T06:56:04Z

@ngxson I'll take a closer look at this today and specifically how about how this could work with a cross-attention model like Llama 3.2 Vision 👍

One thing that is related to this work is something we discussed about how these models should be provided. I initially though that creating a single .gguf for Llama 3.2 which contained both the vision encoder and the language model would be the way to go, but as can be read in the linked discussion having separate models is probably a better solution. It would be great to get some clarification regarding this and if vision encoders should be separate .gguf models.
I'm looking at updating the conversion for Llama 3.2 and make changes to convert_hf_to_gguf.py to produce 2 models (vision encoder, and language model) instead of one. I'd like to try this out with this latest vision api proposal but I'd prefer to know what the model(s) should look like before proceeding to not waste time.

ngxson · 2025-01-20T09:25:23Z

@slaren In my first proposal, I made llama_vision_encode to directly accept an image. But then I decide to split it into postprocess-encode because:

The most important reason is because user will be able to retrieve the number of tokens that the image occupies (this can varies depends on image size, in case of llava-uhd). This should be done before any decode/encode so that the user can leave the appropriate places for the image after the tokenizing step. This is also similar to Processor class on HF transformers where it returns a preprocessed image and the tokenized prompt with correct number of tokens "placeholder" for image embd.
Second reason is that by making this a dedicated function, it's easier to manage error codes. This is mostly because this function work at pixel level, not tensor level.
And third reason is because this preprocessing is indeed thread-safe, so for example, llama-server can do this step in HTTP thread, much like how llama_tokenize is currently done in HTTP thread.

ngxson · 2025-01-20T10:17:42Z

Btw I have been repeatedly mentioned about Processor, so I think it's better to give an example of how it works: https://gist.github.com/ngxson/ca46c72f0cc7b441c30dd85c2a24ee62

ggerganov

Adding some thoughts that I have so far.

Continuing along the idea for having separate models and contexts for the encoder and the decoder. I think with proper llama_batch abstraction we can have the following API:

// vision
patches0 = llama_vision_tokenize(ctx_enc_v, img0);
patches1 = llama_vision_tokenize(ctx_enc_v, img1);

llama_batch_add_image(batch_enc_v, patches0);
llama_batch_add_image(batch_enc_v, patches1);

llama_encode(ctx_enc_v, batch_enc_v);

embd_enc_v = llama_get_embeddings(ctx_enc_v);

// audio
mel0 = llama_audio_tokenize(ctx_enc_a, audio0);
mel1 = llama_audio_tokenize(ctx_enc_a, audio1);

llama_batch_add_audio(batch_enc_a, mel0);
llama_batch_add_audio(batch_enc_a, mel1);

llama_encode(ctx_enc_a, batch_enc_a);

embd_enc_a = llama_get_embeddings(ctx_enc_a);

// text + vision + audio
tokens0 = llama_tokenize(ctx_dec, tokens0);
tokens1 = llama_tokenize(ctx_dec, tokens1);

llama_batch_add_text      (batch_dec, tokens0);
llama_batch_add_embd_image(batch_dec, embd_enc_v);
llama_batch_add_embd_audio(batch_dec, embd_enc_a);
llama_batch_add_text      (batch_dec, tokens1);

llama_decode(ctx_dec, batch_dec);

For cross-attention models such as Llama 3.2 Vision and Whisper, the decoding context ctx_dec could be initialized with a reference to the encoder context:

llama_context_params cparams_dec;
cparams_dec.ctx_cross[0] = ctx_enc_v;
cparams_dec.ctx_cross[1] = ctx_enc_a;

Edit: extended the example with audio input as well.

src/llama-arch.h

ggerganov · 2025-01-20T10:09:34Z

src/llama-vision.cpp

+static ggml_cgraph * clip_image_build_graph(clip_context & ctx, int batch_size, clip_image_size & image_size) {
+    auto & model = *ctx.model;
+    auto & hparams = ctx.model->hparams;
+
+    const int hidden_size   = hparams.hidden_size;
+    const int n_head        = hparams.n_head;
+    const int d_head        = hidden_size / n_head;
+    const int patch_size    = hparams.patch_size;
+    const float eps         = hparams.eps;
+    const int num_patches   = ((image_size.width / patch_size) * (image_size.height / patch_size));
+    const int num_positions = num_patches + (model.class_embedding ? 1 : 0);
+
+    LLAMA_LOG_DEBUG("%s: num_patches = %d\n", __func__, num_patches);


The clip graph should be constructed as any other graph in src/llama.cpp, llm_build_context.

I'm not sure how to do this right now, as I can't see how I can re-use existing build_* to make the cgraph of vision models "blend-in" with the rest of llm_build_context

But what I did so far is to make an equivalent called llama_vision_graph_builder. This meant to be a temporary solution, to simplify the migration in the near future.

Could you please have a look on my llama_vision_graph_builder to see how it can be merged into llm_build_context? Thanks!

ggerganov · 2025-01-20T10:10:29Z

src/llama-vision.cpp

+    delete p;
+}
+
+int32_t llama_vision_encode(struct llama_context * ctx, llama_vision_patches * p) {


Don't think we need separate function - we should be able to reuse llama_encode.

Hmm I don't think we can do this right now, as it requires llama_batch to also accept image tokens.

Do you think it's ok to keep llama_vision_encode(llama_img_tokens &) and refactor llama_batch later on?

ggerganov · 2025-01-20T10:11:53Z

src/llama-vision.cpp

+    if (ctx.ctx_ggml) {
+        ggml_free(ctx.ctx_ggml);
+    }
+    ggml_init_params params = {
+        /*.mem_size   =*/ ggml_tensor_overhead(),
+        /*.mem_buffer =*/ NULL,
+        /*.no_alloc   =*/ true,
+    };
+    ctx.ctx_ggml = ggml_init(params);
+    ctx.output = ggml_dup_tensor(ctx.ctx_ggml, output_node);
+    ggml_backend_alloc_ctx_tensors_from_buft(ctx.ctx_ggml, ctx.model->buft);
+    ggml_backend_tensor_copy(output_node, ctx.output);


If we have a separate encoder context for the clip model, the decoder context could reference tensors from it directly. They would be interpreted as inputs for the decoder.

ggerganov · 2025-01-20T10:29:08Z

src/llama-vision.cpp

+struct llama_vision_patches * llama_vision_patches_init(
+        struct llama_context * ctx,
+        llama_vision_bitmap * bmp) {
+    clip_context & vctx = ctx->vctx;
+    if (vctx.model->hparams.arch == VISION_ARCH_MINICPMV) {
+        return new llama_vision_patches(clip_image_preprocess_minicpmv(vctx, *bmp));
+    }
+    return new llama_vision_patches(clip_image_preprocess(vctx, *bmp));
+}


I agree that the analogy of "tokenization" in the context of vision models is the conversion of "images -> patches". So the patches could be considered as "image tokens" and it seems reasonable to have a separate function to create patches, since this would have to be performed on the CPU.

I am just wondering, is there any reason to expose the patches/slices to the user at all? Can the user do anything with the patches other than just immediately call llama_vision_encode and throw them away? If not, then maybe that could be hidden entirely from the user and llama_vision_encode could take directly an image.

Even though the user cannot explicitly operate with the patches, it seems to make sense to expose this in order to be able to multi-thread the pre-processing step.

Note that we should also consider the case of Whisper in the context of this abstraction. The whisper model takes raw input audio in PCM format, which is first pre-processed into a mel spectrogram. This pre-processing step, similar to the image pre-processing for CLIP and the text tokenization in text models, is performed on the CPU and can be multi-threaded. Of course, any of the three types of pre-processings could be implemented on the GPU with enough effort, but the important aspect is that this pre-processing can be done in parallel for different inputs and once computed, can be reused with different contexts.

In all cases, the pre-processed input is passed to the transformer graph and the first step is always to convert this input in embeddings. For text, this conversion is trivial - ggml_get_rows(w, tokens). For Whisper, this processes involves a couple of convolutions of the mel spectrogram:

https://github.com/ggerganov/whisper.cpp/blob/99b011a9f5e63f71201bfa583250506453a7b995/src/whisper.cpp#L1904-L1918

For CLIP, this appears to be again a convolution operator applied to the pre-processed input (the image patches) in order to obtain the initial embeddings:

https://github.com/ngxson/llama.cpp/blob/4a7ab89d7593ccb89f80e6e118875ee0b3ede3c7/src/llama-vision.cpp#L581-L616

All these conversions of the pre-processed input (tokens, mel, patches) into the initial embeddings should be implemented in a single place: build_inp_embd().

I agree that the analogy of "tokenization" in the context of vision models is the conversion of "images -> patches". So the patches could be considered as "image tokens" and it seems reasonable to have a separate function to create patches

Make sense then. I realized that I was always associate the notion of "token" with "text", but a quick google search tells that: "In LLMs, a token is a basic unit of input or output [...]"

In that sense, I would propose calling it llama_vision_img_tokens (though, it can be a bit confused because user may expect it a std::vector due to the plural "tokens")

// Structure represents the basic input unit of vision model // This can be a processed image or slices of images under the hood struct llama_vision_img_tokens; // User must reserve N number of tokens in tokenized text prompt for each image int32_t llama_vision_get_n_tokens(const llama_vision_img_tokens * img_tokens);

src/llama-vision.h

danbev · 2025-01-22T16:22:51Z

@ngxson Sorry about the delay. I've been able to "force" support for mllama using the latest vision api, that is get an example working. I'm now going to iterate on this and try to figure out how cross attention will work. Just wanted to let you know that some progress is being made.

There is an issue I'm having with the vocab size which I'm not exactly sure how to handle. If anyone has some thoughts around this please let me know.

ngxson · 2025-01-22T21:58:42Z

@danbev No worries, I was busy with minicpm-v too. It's still not working now (inference works, but just missing the llava-uhd preprocessor). Will have a look on your implementation of mllama very soon.

ngxson · 2025-01-22T22:36:11Z

So, minicpm-v template is more complicated because it contains bot the image and all the slices. Here is what it looks like in minicpmv-cli:

<image> (if no slice, we only have one image) </image><slice><image> (first slice) </image><image> (second slice) </image> .... (n-th slice) </slice>

To get rid of this complication, my idea is to have the embeddings of these tokens (<image>, </image>, <slice> and </slice>) appended into the output tensor returned fromllama_vision_encode.

This will make this formatting transparent to the text tokenizer, but will require embeddings of these tokens to be stored as one-hot vectors in the vision model (of course we can use ggml_get_rows to get them, but will be quite messy)

ngxson · 2025-01-23T11:23:25Z

Ok so I managed to get minicpm-v kinda work out of the box with the API (no changes to user-space code is required).

Upon giving it win XP wallpaper bliss, it says: I see a serene landscape featuring a vast expanse of green grass under a clear blue sky

It currently operates with a resized version of the image (like llava), so the performance will be bad for bigger images (with more details). I'll get llava-uhd to work, which breaks the image into slices and thus allow the LLM to "see" the image at different zoom level, thus preserving details.

ngxson · 2025-02-15T14:39:18Z

@agNihit928 I think something gets buggy when I rebase to latest master, you can maybe go back to c3a654c to see if it works.

agNihit928 · 2025-02-15T15:45:04Z

Sure @ngxson
Will check it out
Thanks

ngxson · 2025-03-01T15:07:06Z

This PR is only tested with SmolVLM 500M: https://huggingface.co/HuggingFaceTB/SmolVLM-500M-Instruct

If you're using another model, I don't know.

ngxson · 2025-03-01T15:07:47Z

Btw, a small reminder so I don't forget:

Important

Please do NOT upload gguf produced via this PR on the internet. People don't know how to use it and they will complain, very annoying!

agNihit928 · 2025-03-01T15:26:37Z

@AIWintermuteAI
Based on my testing, I was able to generate the GGUF files for both 256M and the 500M models(of the original Hugging Face repos) with the mentioned branch, i.e., c3a654c

AIWintermuteAI · 2025-03-01T15:27:11Z

Ah, interesting!
I was using
https://huggingface.co/HuggingFaceTB/SmolVLM-Instruct
which "supposed to be" the same, but has different (broken) config.

AIWintermuteAI · 2025-03-01T15:27:30Z

Absolutely, I'm not sharing anything, since I can't even get it to work yet xD

ngxson · 2025-03-01T15:44:46Z

Try another image / format / resolution. I'd recommend you pinpoint problem on your side first, to prevent spamming this thread with too much data.

And again, nothing is guaranteed to work. This is a WIP.

(I hide your comments because they take too much space, it's hard for me to follow)

AIWintermuteAI · 2025-03-01T16:19:36Z

Sure, no worries! I'll use collapsible text next time I need to post large logs, thanks for the reminder.
It's 300x241 pixels image I found when searching for bliss wallpaper on Google. Perhaps you can share your testing sample here?
And again, no worries, I totally understand this is WIP - my comments are just for feedback to you and other people who might be testing this, it's not a nudge :)

I'll try testing with some more images and I guess see what can be done about ValueError: Can not map tensor 'model.text_model.embed_tokens.weight' on the latest commit here. Looks like that .weight part normally is removed, but for some reason it is not.

AIWintermuteAI · 2025-03-01T16:26:23Z

Update:
Need to include the instructions and special tokens into the prompt, e.g.

./build/bin/llama-vision --image bliss.png -m ../SmolVLM-500M-Instruct/SmolVLM-500M-Instruct-F16.gguf -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<img_placement>\nwhat do you see?<|im_end|>\n<|im_start|>assistant\n"

Then everything works!

eval text batch (14 tokens)
eval image batch (64 embeddings)
eval text batch (26 tokens)
prompt processed, 90 tokens
The sky is a brilliant blue, dotted with fluffy white clouds that look like cotton candy. The sun is shining, casting a warm glow across the landscape. To the left, there's a small hill, covered in green grass and dotted with wildflowers. The hill is dotted with trees, and the leaves are a rich, dark

ngxson · 2025-03-01T22:15:43Z

OK so Phi-4-multimodal-instruct is a bit more messy.

Traditional vision model are simple: just 2 separated transformers, one for vision encoder and one for language decoder. However, on Phi-4 embedding data from vision/audio encoder must also be processed using a dedicated LoRA adapter applied on top of the language decoder

Very technical details

Normal vision models:

flowchart TD
  image --> vision_transformer
  vision_transformer[[vision_transformer]] --> embd_input
  text_input --> embd_input
  embd_input --> text_transformer[[text_transformer]]
  text_transformer --> text_output

Phi-4 multimodal:

flowchart TD
  image --> vision_transformer[[vision_transformer]]
  vision_transformer --> embd_input
  audio --> audio_transformer[[audio_transformer]]
  audio_transformer --> embd_input
  text_input --> embd_input
  embd_input --> text_transformer
  subgraph text_transformer
    vision_LoRA[[vision_LoRA]]
    audio_LoRA[[audio_LoRA]]
    base_model[[base_model]]
  end
  text_transformer --> text_output

Diagram from the paper:

For now, I've been able to convert only the text/language part. Turns out, it just a simple Phi-4-mini-instruct under the hood, so nothing interesting.

This is also mentioned in the paper:

~~Will see if it's easy to re-implement that LoRA + projectors. Otherwise, we will need to delay Phi-4-multimodal for later.~~

Update: the LoRA part is very complicated to implement right now, so it will be for a dedicated research/PR in the future.

revert Phi-4-mm since we cannot support LoRA for now, too complicated

This reverts commit c4e9231.

This reverts commit 21aa2f5.

lucasjinreal · 2025-03-09T08:11:44Z

Hello, does qwen2.5 vl conversion script from raw safetensors into GGUF supported now? Also curious what's the standared way to support a new model in convert_hf_to_gguf.py, it's looks like a little bit tricky needs handle very specific tensor names in various model arch

ngxson added 2 commits January 18, 2025 12:19

wip

2a458d1

llama : second attempt to refactor vision API

0a81051

github-actions bot added the examples label Jan 18, 2025

add back convert hf to gguf

6cabdda

github-actions bot added python python script changes server labels Jan 18, 2025

ngxson added 2 commits January 19, 2025 16:29

add mobilevlm

d0068ef

wip minicpmv

4a7ab89

ngxson commented Jan 19, 2025

View reviewed changes

ggerganov mentioned this pull request Jan 20, 2025

support Minicpm-omni in image understanding #11289

Merged

ggerganov reviewed Jan 20, 2025

View reviewed changes

danbev reviewed Jan 20, 2025

View reviewed changes

src/llama-vision.h Outdated Show resolved Hide resolved

ngxson added 5 commits January 21, 2025 10:51

change gguf KV from clip to vit

431bb08

reuse LLM_ARCH and LLM_TENSOR

bd0714b

rename everywhere

ad38e87

Merge branch 'master' into xsn/vision_2

32daa38

temporary refactor llama_vision_graph_builder

9716c7b

ngxson added 2 commits January 22, 2025 22:26

wip minicpmv

ba489b4

minicpmv works but missing uhd slices

c0d93dd

minicpm working without uhd

8586d23

correct positions for siglip

25a97ce

This comment was marked as outdated.

Sign in to view

ngxson mentioned this pull request Feb 28, 2025

Feature Request: Support for Phi4MMForCausalLM Architecture #12117

Open

4 tasks

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

ngxson added 4 commits March 1, 2025 22:42

Merge branch 'master' into xsn/vision_2

0ec6bce

clarify

7863232

fix smolVLM conversion

c4e9231

phi-4-mm TEXT-ONLY for now

21aa2f5

ngxson added 3 commits March 2, 2025 10:29

Revert "fix smolVLM conversion"

0ead9c4

This reverts commit c4e9231.

a bit cleaner for llava conversion

45bc188

Revert "phi-4-mm TEXT-ONLY for now"

5283a15

This reverts commit 21aa2f5.

This was referenced Mar 5, 2025

clip.cpp / gguf-py: Support for Qwen2.5 VL - WIP / REVIEW NEEDED (#11483) #12119

Closed

an Error occured during Quantization Independent-AI-Labs/local-super-agents#3

Open

ngxson mentioned this pull request Mar 9, 2025

(research) experiment with phi-4-multimodal vision support #12274

Draft

AIWintermuteAI mentioned this pull request Mar 10, 2025

server: Bring back multimodal support #8010

Open

This was referenced Mar 10, 2025

clip : bring back GPU support #12322

Merged

clip : Experimental support for Gemma 3 vision #12344

Merged

llama : second attempt to refactor vision API #11292

Are you sure you want to change the base?

llama : second attempt to refactor vision API #11292

Conversation

ngxson commented Jan 18, 2025 • edited Loading

Please do NOT upload gguf produced via this PR on the internet. People don't know how to use it and they will complain, very annoying!

ngxson commented Jan 19, 2025

ngxson Jan 19, 2025

Choose a reason for hiding this comment

slaren Jan 20, 2025

Choose a reason for hiding this comment

ggerganov Jan 20, 2025

Choose a reason for hiding this comment

slaren commented Jan 20, 2025

danbev commented Jan 20, 2025

ngxson commented Jan 20, 2025 • edited Loading

ngxson commented Jan 20, 2025 • edited Loading

ggerganov left a comment • edited Loading

Choose a reason for hiding this comment

ggerganov Jan 20, 2025

Choose a reason for hiding this comment

ngxson Jan 22, 2025

Choose a reason for hiding this comment

ggerganov Jan 20, 2025

Choose a reason for hiding this comment

ngxson Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

ggerganov Jan 20, 2025

Choose a reason for hiding this comment

ggerganov Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

ngxson Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

danbev commented Jan 22, 2025

ngxson commented Jan 22, 2025 • edited Loading

ngxson commented Jan 22, 2025

ngxson commented Jan 23, 2025

ngxson commented Feb 15, 2025

agNihit928 commented Feb 15, 2025

This comment was marked as outdated.

This comment was marked as outdated.

ngxson commented Mar 1, 2025

ngxson commented Mar 1, 2025

agNihit928 commented Mar 1, 2025

AIWintermuteAI commented Mar 1, 2025

AIWintermuteAI commented Mar 1, 2025

This comment was marked as resolved.

This comment was marked as resolved.

ngxson commented Mar 1, 2025 • edited Loading

AIWintermuteAI commented Mar 1, 2025

AIWintermuteAI commented Mar 1, 2025 • edited Loading

ngxson commented Mar 1, 2025 • edited Loading

lucasjinreal commented Mar 9, 2025

ngxson commented Jan 18, 2025 •

edited

Loading

ngxson commented Jan 20, 2025 •

edited

Loading

ngxson commented Jan 20, 2025 •

edited

Loading

ggerganov left a comment •

edited

Loading

ngxson Jan 21, 2025 •

edited

Loading

ggerganov Jan 20, 2025 •

edited

Loading

ngxson Jan 20, 2025 •

edited

Loading

ngxson commented Jan 22, 2025 •

edited

Loading

ngxson commented Mar 1, 2025 •

edited

Loading

AIWintermuteAI commented Mar 1, 2025 •

edited

Loading

ngxson commented Mar 1, 2025 •

edited

Loading