-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Significantly optimize cost calculation model
Comparing profile data for the original Zopfli implementation in C vs. this Rust port in a realistic test file highlighted that Rust code spent a significantly larger portion of time computing the translated `GetCostStat` function, `zopfli::squeeze::get_cost_stat`. This is one of the hottest functions, responsible for more than 5% of the total samples collected by the sampling profiler I used. When inspecting the generated x64 assembly code for this function in Rust vs. in C, I noticed that the Rust function took ~200 lines of assembly, while in C the same function is translated to ~50 lines. Upon closer inspection, I noticed Rust's codegen for this function was particularly suboptimal for two main reasons: - Safe Rust performs array index bound checks at runtime unless the compiler can assert that the index value range is always in bounds. - Functions such as `get_dist_extra_bits`, which are implemented via `match` expressions with lots of patterns, got translated into many inefficient test and jump instructions, while the related C code leveraged intrinsics to compute the same result with better readability and much less and much faster instructions. This change improves performance in both fronts: first, it tactically pads lookup tables and inserts assertions so that the overall amount of bound checks is reduced, leading to increased optimization opportunities for the compiler; second, `match` expressions are replaced by their more readable, efficient, and correct alternatives. With this change, the Rust function now takes ~90 lines of assembly, a 55% improvement, and a total file compression performance uplift of ~10% for several test files is achieved. Related to #12.
- Loading branch information
1 parent
f2b32cc
commit 718fb62
Showing
4 changed files
with
38 additions
and
77 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters