Sparge Attention

This repository provides the official implementation of SpargeAttn.

SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
Paper: https://arxiv.org/abs/2502.18137
Jintao Zhang, Chendong Xiang, Haofeng Huang, Haocheng Xi, Jia Wei, Jun Zhu, Jianfei Chen

Installation

Base environment

python>=3.9 , torch>=2.3.0

CUDA:
- >=12.8 for Blackwell
- >=12.4 for fp8 support on Ada
- >=12.3 for fp8 support on Hopper
- >=12.0 for Ampere

Install Package

python setup.py install   # or pip install -e .

Avalible API

spas_sage2_attn_meansim_cuda: SpargeAttn based on SageAttention2.
spas_sage_attn_meansim_cuda: SpargeAttn based on SageAttention.

Usage Examples

CogVideoX

Tuning:

python evaluate/cogvideo_example.py  --use_spas_sage_attn --model_out_path evaluate/models_dict/CogVideoX-2b_0.06_0.07.pt --tune

Inference:

python evaluate/cogvideo_example.py  --use_spas_sage_attn --model_out_path evaluate/models_dict/CogVideoX-2b_0.06_0.07.pt

Note: We provide pre-tuned hyper-parameters CogVideoX-2b_0.06_0.07.pt that allow running the inference script directly. However, for better performance in both speed and quality, we recommend re-tuning because the provided hyper-parameters are tuned with SpargeAttn based on SageAttention, whereas the default API is based on SageAttention2 now.

LLama

The tuning and inference usage is similar to CogVideoX.

Supported models

Here’s a list of the model modifications we’ve implemented so far. Our approach is universal, and we warmly welcome contributions! Feel free to submit a pull request to support more models. 🚀

model name	example script	tuned ckpt
CogVideoX	evaluate/cogvideo_example.py	evaluate/models_dict/CogVideoX-2b_0.06_0.07.pt
Flux	evaluate/flux_example.py	TBD

Performance

Note: All experiments in the above Table and our paper used SpargeAttn based on SageAttention. An updated implementation based on SageAttention2, is available now. It further offers a 30% speedup.

The quality of video generation on Mochi.

End-to-end performance of NIAH.

Citation

If you use this code or find our work valuable, please cite:

@misc{zhang2025spargeattn,
      title={SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference}, 
      author={Jintao Zhang and Chendong Xiang and Haofeng Huang and Jia Wei and Haocheng Xi and Jun Zhu and Jianfei Chen},
      year={2025},
      eprint={2502.18137},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.18137}, 
}

@inproceedings{zhang2025sageattention,
      title={SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration}, 
      author={Zhang, Jintao and Wei, Jia and Zhang, Pengle and Zhu, Jun and Chen, Jianfei},
      booktitle={International Conference on Learning Representations (ICLR)},
      year={2025}
}

@misc{zhang2024sageattention2,
      title={SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization}, 
      author={Jintao Zhang and Haofeng Huang and Pengle Zhang and Jia Wei and Jun Zhu and Jianfei Chen},
      year={2024},
      eprint={2411.10958},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2411.10958}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
csrc		csrc
evaluate		evaluate
spas_sage_attn		spas_sage_attn
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparge Attention

Installation

Base environment

Install Package

Avalible API

Usage Examples

CogVideoX

LLama

Supported models

Performance

Citation

About

Releases

Packages

Contributors 2

Languages

License

thu-ml/SpargeAttn

Folders and files

Latest commit

History

Repository files navigation

Sparge Attention

Installation

Base environment

Install Package

Avalible API

Usage Examples

CogVideoX

LLama

Supported models

Performance

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages