Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SwiftKV support (~2x performance boost) #1101

Open
coder543 opened this issue Jan 25, 2025 · 0 comments
Open

SwiftKV support (~2x performance boost) #1101

coder543 opened this issue Jan 25, 2025 · 0 comments
Labels
new feature New feature or request

Comments

@coder543
Copy link

blog post: https://www.snowflake.com/en/engineering-blog/swiftkv-llm-compute-reduction/

full paper: https://arxiv.org/abs/2410.03960

Snowflake documented a new KV-cache optimization that can yield significant performance improvements. They're already integrating this into vLLM.

Specifically, Snowflake has introduced SwiftKV, a method designed to address the computational bottleneck associated with processing long input prompts during inference. In many enterprise use cases, the number of prompt tokens significantly exceeds the number of generated tokens. SwiftKV tackles this by intelligently reusing computations from earlier transformer layers to generate the KV cache for subsequent layers, a technique they refer to as "SingleInputKV". This approach avoids redundant calculations in later layers, where outputs tend to stabilize. Additionally, "AcrossKV" provides memory compression that can be used alongside SingleInputKV.

Importantly, Snowflake's benchmarks indicate that these optimizations result in a minimal loss of accuracy, typically around 1 point on average, as shown in their blog post. This suggests that the performance gains are achieved without significant compromises to the model's output quality. Their tests, using Llama 3.1 models on H100 GPUs, demonstrate substantial throughput gains (up to 2x) and latency reductions, particularly for long-input scenarios.

(Just a cool new paper I thought might be of interest here.)

@coder543 coder543 added the new feature New feature or request label Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant