Micro-benchmark inference #1759

jainapurva · 2025-02-21T23:45:19Z

This is the first PR, in the benchmarking effort. It provides the outline to setup inference microbenchmarking for quant api's in torchao.

The different inputs like quantization techniques, matrix sizes, compile, sparsity etc, will be given as input to the python script. The options are re-defined in the scipt for quantization techniques, and a developer can add new quant technique. The script will generate a csv with performance numbers, and that'll be used to plot charts and as an input to dashboard . The script performs the following tasks:

Take input as .yml
Performs benchmarking for quantize_ APIs eval time for configurations
Record all the config params and their respective time in csv file.
Test cases

Future PRs will include more config options and process the generated results.

Run command:

python benchmarks/microbenchmarks/benchmark_runner.py --config benchmarks/microbenchmarks/benchmark_config.yml

Output will be stored in

benchmarks/microbenchmarks/results/results.csv

pytorch-bot · 2025-02-21T23:45:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1759

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 22b3ddd with merge base bc4f51d ():

NEW FAILURE - The following job has failed:

Code Analysis with Ruff / build (3.9) (gh)
benchmarks/microbenchmarks/benchmark_inference.py:47:8: F632 [*] Use != to compare constant literals

This comment was automatically generated by Dr. CI and updates every 15 minutes.

benchmarks/microbenchmarks/test/benchmark_config.yml

benchmarks/microbenchmarks/benchmark_inference.py

benchmarks/microbenchmarks/test/test_benchmark_inference.py

benchmarks/microbenchmarks/utils.py

benchmarks/microbenchmarks/test/benchmark_config.yml

benchmarks/microbenchmarks/utils.py

HDCharles · 2025-03-12T02:42:27Z

benchmarks/microbenchmarks/test/benchmark_config.yml

+        [4096, 4096, 1024]
+      ]
+  high_precision_dtype: "torch.bfloat16"
+  compile: true


nit: I tend to prefer to consolidate multiple variables into 1, like compile vs compile_mode is kind of redundant when you could just have

compile: "max-autotune" or compile: "false"

HDCharles

lgtm

Add files

7a07885

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 21, 2025

jainapurva and others added 3 commits February 24, 2025 21:30

Add basic benchmarks for inference

c8ddfdb

Update new quantize_ api

a56f3ab

Updates

24bea42

jainapurva force-pushed the bench_structure branch from 0fc7212 to 24bea42 Compare February 25, 2025 20:17

jainapurva added topic: new feature Use this tag if this PR adds a new feature topic: performance Use this tag if this PR improves the performance of a feature labels Feb 25, 2025

jainapurva requested a review from HDCharles February 25, 2025 20:32

jainapurva marked this pull request as ready for review February 25, 2025 22:02

jainapurva and others added 3 commits February 25, 2025 15:31

New test folder

97cea12

Added test cases

35b2840

Lint fixes

8b7291c

jainapurva force-pushed the bench_structure branch from e9c3a10 to a750f7c Compare February 26, 2025 05:23

jainapurva requested review from drisspg, vkuzo and jerryzh168 February 26, 2025 17:27

Merge remote-tracking branch 'origin/main' into bench_structure

a828f5b

jainapurva force-pushed the bench_structure branch from a750f7c to a828f5b Compare February 26, 2025 17:28