Skip to content

Commit d6ac3be

Browse files
ftgreatldwang
and
ldwang
authored
Add benchmark of Aquila2 34B AWQ in README.md. (#126)
Signed-off-by: ldwang <[email protected]> Co-authored-by: ldwang <[email protected]>
1 parent dc13f0b commit d6ac3be

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed

README.md

+18
Original file line numberDiff line numberDiff line change
@@ -258,6 +258,24 @@ generation_output = model.generate(
258258
| 1 | 1024 | 1024 | 2256.22 | 94.0237 | 4.69 GB (19.78%) |
259259
| 1 | 2048 | 2048 | 1831.71 | 94.2032 | 6.83 GB (28.83%) |
260260

261+
### Aquila2 34B
262+
263+
- Note: Fast generation, fast context processing
264+
- GPU: NVIDIA A100-SXM4-40GB
265+
- Command: `python examples/benchmark.py --model_path casperhansen/aquilachat2-34b-awq --quant_file pytorch_model.bin.index.json`
266+
- Version: GEMM
267+
268+
| Batch Size | Prefill Length | Decode Length | Prefill tokens/s | Decode tokens/s | Memory (VRAM) |
269+
|-------------:|-----------------:|----------------:|-------------------:|------------------:|:------------------|
270+
| 1 | 32 | 32 | 36.7505 | 23.423 | 18.26 GB (46.12%) |
271+
| 1 | 64 | 64 | 516.544 | 23.3536 | 18.26 GB (46.12%) |
272+
| 1 | 128 | 128 | 643.968 | 23.3803 | 18.26 GB (46.12%) |
273+
| 1 | 256 | 256 | 736.236 | 23.389 | 18.34 GB (46.32%) |
274+
| 1 | 512 | 512 | 829.405 | 23.3889 | 18.54 GB (46.84%) |
275+
| 1 | 1024 | 1024 | 836.023 | 23.3757 | 18.95 GB (47.87%) |
276+
| 1 | 2048 | 2048 | 802.632 | 23.3777 | 20.25 GB (51.15%) |
277+
| 1 | 4096 | 4096 | 722.49 | 23.4252 | 25.38 GB (64.12%) |
278+
261279
## Reference
262280

263281
If you find AWQ useful or relevant to your research, you can cite their [paper](https://arxiv.org/abs/2306.00978):

0 commit comments

Comments
 (0)