-
-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data-oriented design DocumentScope #1517
Conversation
@Techatrix we discussed the const S = struct {
const z = 10;
z: u8,
} is completely legal code, but the map would only contain the second decl unless the map is typed. I've removed the types locally but I think we might need to add them back later for correctness reasons (+ the lookup map would be broken afaik; super rare edge case but it's still valid Zig code). :) (Wait that's why the PR was merged in the first place, oops) |
367d23c
to
44ba043
Compare
here are some performance metrics: Benchmark 1 (33 runs): ./document_scope_master
measurement mean ± σ min … max outliers delta
wall_time 3.69s ± 43.0ms 3.64s … 3.87s 1 ( 3%) 0%
peak_rss 11.2MB ± 52.6KB 11.1MB … 11.3MB 0 ( 0%) 0%
cpu_cycles 12.9G ± 160M 12.7G … 13.6G 3 ( 9%) 0%
instructions 24.3G ± 485 24.3G … 24.3G 0 ( 0%) 0%
cache_references 46.5M ± 984K 45.2M … 49.3M 1 ( 3%) 0%
cache_misses 6.63M ± 263K 6.23M … 7.29M 0 ( 0%) 0%
branch_misses 112M ± 1.15M 110M … 116M 1 ( 3%) 0%
Benchmark 2 (37 runs): ./document_scope_44ba04327e72acffcd0d30d17e5e47fbe9370fac # all tests pass!!!!
measurement mean ± σ min … max outliers delta
wall_time 3.25s ± 30.4ms 3.17s … 3.30s 1 ( 3%) ⚡- 11.9% ± 0.5%
peak_rss 8.19MB ± 54.1KB 8.08MB … 8.31MB 0 ( 0%) ⚡- 26.7% ± 0.2%
cpu_cycles 12.6G ± 126M 12.3G … 12.8G 0 ( 0%) ⚡- 2.2% ± 0.5%
instructions 23.6G ± 1.89K 23.6G … 23.6G 0 ( 0%) ⚡- 3.0% ± 0.0%
cache_references 63.9M ± 1.90M 61.2M … 70.8M 2 ( 5%) 💩+ 37.3% ± 1.6%
cache_misses 1.35M ± 291K 934K … 2.32M 1 ( 3%) ⚡- 79.7% ± 2.0%
branch_misses 126M ± 3.85M 117M … 132M 0 ( 0%) 💩+ 12.2% ± 1.2%
Benchmark 3 (38 runs): ./document_scope_0b130f8f961e7995e2582511c4cfae15f945fad7 # initialize ChildDeclarations based on small_size
measurement mean ± σ min … max outliers delta
wall_time 3.23s ± 142ms 3.16s … 4.06s 3 ( 8%) ⚡- 12.5% ± 1.4%
peak_rss 8.19MB ± 60.8KB 8.08MB … 8.32MB 0 ( 0%) ⚡- 26.7% ± 0.2%
cpu_cycles 12.5G ± 553M 12.2G … 15.7G 3 ( 8%) ⚡- 2.7% ± 1.6%
instructions 23.6G ± 1.90K 23.6G … 23.6G 0 ( 0%) ⚡- 3.1% ± 0.0%
cache_references 63.8M ± 2.55M 61.1M … 73.5M 4 (11%) 💩+ 37.1% ± 2.0%
cache_misses 1.16M ± 391K 652K … 2.34M 4 (11%) ⚡- 82.5% ± 2.4%
branch_misses 120M ± 3.82M 115M … 134M 1 ( 3%) 💩+ 7.4% ± 1.2%
Benchmark 4 (38 runs): ./document_scope_4ac62836fc4ba750fd482657d72bcacbdb3c7041 # add a small size optimization to child scopes and pack scope fields
measurement mean ± σ min … max outliers delta
wall_time 3.21s ± 38.1ms 3.15s … 3.34s 1 ( 3%) ⚡- 12.9% ± 0.5%
peak_rss 8.19MB ± 45.9KB 8.08MB … 8.25MB 0 ( 0%) ⚡- 26.7% ± 0.2%
cpu_cycles 12.4G ± 148M 12.2G … 13.0G 1 ( 3%) ⚡- 3.2% ± 0.6%
instructions 23.4G ± 2.05K 23.4G … 23.4G 0 ( 0%) ⚡- 3.6% ± 0.0%
cache_references 63.7M ± 1.34M 61.1M … 66.6M 0 ( 0%) 💩+ 36.8% ± 1.2%
cache_misses 1.43M ± 379K 789K … 2.07M 0 ( 0%) ⚡- 78.4% ± 2.4%
branch_misses 123M ± 3.43M 118M … 133M 1 ( 3%) 💩+ 9.7% ± 1.1%
Benchmark 5 (40 runs): ./document_scope_c0c9ca0fde913ed244969f0aad2b72c07b939ac8 # move switch cases of makeScopeAt into separate functions
measurement mean ± σ min … max outliers delta
wall_time 3.03s ± 79.4ms 2.96s … 3.48s 1 ( 3%) ⚡- 18.0% ± 0.8%
peak_rss 8.20MB ± 56.6KB 8.08MB … 8.32MB 0 ( 0%) ⚡- 26.6% ± 0.2%
cpu_cycles 11.7G ± 313M 11.5G … 13.5G 1 ( 3%) ⚡- 9.0% ± 0.9%
instructions 21.8G ± 1.45K 21.8G … 21.8G 6 (15%) ⚡- 10.5% ± 0.0%
cache_references 47.0M ± 461K 45.9M … 48.2M 1 ( 3%) + 1.0% ± 0.7%
cache_misses 1.47M ± 424K 930K … 3.12M 2 ( 5%) ⚡- 77.9% ± 2.6%
branch_misses 119M ± 3.15M 114M … 125M 0 ( 0%) 💩+ 5.9% ± 1.0%
Benchmark 6 (46 runs): ./document_scope_c769e596319341ff285fd4f82b0be3fa50e6eb54 # remove the start_token parameter from makeScopeAt
measurement mean ± σ min … max outliers delta
wall_time 2.63s ± 28.1ms 2.60s … 2.72s 4 ( 9%) ⚡- 28.8% ± 0.4%
peak_rss 8.20MB ± 59.8KB 8.08MB … 8.31MB 0 ( 0%) ⚡- 26.6% ± 0.2%
cpu_cycles 10.2G ± 108M 10.0G … 10.5G 3 ( 7%) ⚡- 21.0% ± 0.5%
instructions 19.1G ± 2.05K 19.1G … 19.1G 0 ( 0%) ⚡- 21.6% ± 0.0%
cache_references 46.2M ± 415K 45.4M … 47.1M 0 ( 0%) - 0.7% ± 0.7%
cache_misses 1.58M ± 602K 805K … 3.42M 1 ( 2%) ⚡- 76.1% ± 3.4%
branch_misses 108M ± 2.43M 104M … 114M 0 ( 0%) ⚡- 3.4% ± 0.8% The script I used reads Sema.zig, parses it, then constructs and deinits the document scope 100 times. I should do some more sophisticated benchmarking that only measures everything document scope related but this should serve as a starting point. |
Thanks for the bug fixes! Wow, that perf is pretty damn incredible 🌟 |
I've also removed the return type and the `keep_block_open` parameter. this diff is much easier to digest when hiding whitespace changes
Here is the diff when reverting the changes that 1635a21 made to
Now I just have to find a more optimized solution. 🤔 |
New improvements:
|
I've ran the following hyperfine benchmark on a different linux distro where I also had no desktop environment running which may explain why its faster.
|
I'm done optimizing for now. That |
Great stuff @Techatrix! Could you approve and merge this, it definitely seems ready :) |
yolo |
@Techatrix and I talked about this one night and now it's becoming a reality :P
Still doesn't compile, but once it does I'll add tests for stuff I've spotted, then benchmark and compare :)it works woohoo!!!