Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data-oriented design DocumentScope #1517

Merged
merged 24 commits into from
Oct 22, 2023
Merged

Data-oriented design DocumentScope #1517

merged 24 commits into from
Oct 22, 2023

Conversation

SuperAuguste
Copy link
Member

@SuperAuguste SuperAuguste commented Oct 14, 2023

@Techatrix and I talked about this one night and now it's becoming a reality :P

Still doesn't compile, but once it does I'll add tests for stuff I've spotted, then benchmark and compare :)
it works woohoo!!!

@SuperAuguste
Copy link
Member Author

SuperAuguste commented Oct 14, 2023

@Techatrix we discussed the Kind enum with field and variable for lookups not being desirable; interesting find:

const S = struct {
    const z = 10;
    z: u8,
}

is completely legal code, but the map would only contain the second decl unless the map is typed. I've removed the types locally but I think we might need to add them back later for correctness reasons (+ the lookup map would be broken afaik; super rare edge case but it's still valid Zig code). :)

(Wait that's why the PR was merged in the first place, oops)

@SuperAuguste SuperAuguste marked this pull request as ready for review October 15, 2023 06:14
@Techatrix
Copy link
Member

Techatrix commented Oct 15, 2023

here are some performance metrics:

Benchmark 1 (33 runs): ./document_scope_master
  measurement          mean ± σ            minmax           outliers         delta
  wall_time          3.69s  ± 43.0ms    3.64s  … 3.87s           1 ( 3%)        0%
  peak_rss           11.2MB ± 52.6KB    11.1MB … 11.3MB          0 ( 0%)        0%
  cpu_cycles         12.9G  ±  160M     12.7G  … 13.6G           3 ( 9%)        0%
  instructions       24.3G  ±  485      24.3G  … 24.3G           0 ( 0%)        0%
  cache_references   46.5M  ±  984K     45.2M  … 49.3M           1 ( 3%)        0%
  cache_misses       6.63M  ±  263K     6.23M  … 7.29M           0 ( 0%)        0%
  branch_misses       112M  ± 1.15M      110M  …  116M           1 ( 3%)        0%
Benchmark 2 (37 runs): ./document_scope_44ba04327e72acffcd0d30d17e5e47fbe9370fac # all tests pass!!!!
  measurement          mean ± σ            minmax           outliers         delta
  wall_time          3.25s  ± 30.4ms    3.17s  … 3.30s           1 ( 3%)        ⚡- 11.9% ±  0.5%
  peak_rss           8.19MB ± 54.1KB    8.08MB … 8.31MB          0 ( 0%)        ⚡- 26.7% ±  0.2%
  cpu_cycles         12.6G  ±  126M     12.3G  … 12.8G           0 ( 0%)        ⚡-  2.2% ±  0.5%
  instructions       23.6G  ± 1.89K     23.6G  … 23.6G           0 ( 0%)        ⚡-  3.0% ±  0.0%
  cache_references   63.9M  ± 1.90M     61.2M  … 70.8M           2 ( 5%)        💩+ 37.3% ±  1.6%
  cache_misses       1.35M  ±  291K      934K  … 2.32M           1 ( 3%)        ⚡- 79.7% ±  2.0%
  branch_misses       126M  ± 3.85M      117M  …  132M           0 ( 0%)        💩+ 12.2% ±  1.2%
Benchmark 3 (38 runs): ./document_scope_0b130f8f961e7995e2582511c4cfae15f945fad7 # initialize ChildDeclarations based on small_size 
  measurement          mean ± σ            minmax           outliers         delta
  wall_time          3.23s  ±  142ms    3.16s  … 4.06s           3 ( 8%)        ⚡- 12.5% ±  1.4%
  peak_rss           8.19MB ± 60.8KB    8.08MB … 8.32MB          0 ( 0%)        ⚡- 26.7% ±  0.2%
  cpu_cycles         12.5G  ±  553M     12.2G  … 15.7G           3 ( 8%)        ⚡-  2.7% ±  1.6%
  instructions       23.6G  ± 1.90K     23.6G  … 23.6G           0 ( 0%)        ⚡-  3.1% ±  0.0%
  cache_references   63.8M  ± 2.55M     61.1M  … 73.5M           4 (11%)        💩+ 37.1% ±  2.0%
  cache_misses       1.16M  ±  391K      652K  … 2.34M           4 (11%)        ⚡- 82.5% ±  2.4%
  branch_misses       120M  ± 3.82M      115M  …  134M           1 ( 3%)        💩+  7.4% ±  1.2%
Benchmark 4 (38 runs): ./document_scope_4ac62836fc4ba750fd482657d72bcacbdb3c7041 # add a small size optimization to child scopes and pack scope fields 
  measurement          mean ± σ            minmax           outliers         delta
  wall_time          3.21s  ± 38.1ms    3.15s  … 3.34s           1 ( 3%)        ⚡- 12.9% ±  0.5%
  peak_rss           8.19MB ± 45.9KB    8.08MB … 8.25MB          0 ( 0%)        ⚡- 26.7% ±  0.2%
  cpu_cycles         12.4G  ±  148M     12.2G  … 13.0G           1 ( 3%)        ⚡-  3.2% ±  0.6%
  instructions       23.4G  ± 2.05K     23.4G  … 23.4G           0 ( 0%)        ⚡-  3.6% ±  0.0%
  cache_references   63.7M  ± 1.34M     61.1M  … 66.6M           0 ( 0%)        💩+ 36.8% ±  1.2%
  cache_misses       1.43M  ±  379K      789K  … 2.07M           0 ( 0%)        ⚡- 78.4% ±  2.4%
  branch_misses       123M  ± 3.43M      118M  …  133M           1 ( 3%)        💩+  9.7% ±  1.1%
Benchmark 5 (40 runs): ./document_scope_c0c9ca0fde913ed244969f0aad2b72c07b939ac8 # move switch cases of makeScopeAt into separate functions 
  measurement          mean ± σ            minmax           outliers         delta
  wall_time          3.03s  ± 79.4ms    2.96s  … 3.48s           1 ( 3%)        ⚡- 18.0% ±  0.8%
  peak_rss           8.20MB ± 56.6KB    8.08MB … 8.32MB          0 ( 0%)        ⚡- 26.6% ±  0.2%
  cpu_cycles         11.7G  ±  313M     11.5G  … 13.5G           1 ( 3%)        ⚡-  9.0% ±  0.9%
  instructions       21.8G  ± 1.45K     21.8G  … 21.8G           6 (15%)        ⚡- 10.5% ±  0.0%
  cache_references   47.0M  ±  461K     45.9M  … 48.2M           1 ( 3%)          +  1.0% ±  0.7%
  cache_misses       1.47M  ±  424K      930K  … 3.12M           2 ( 5%)        ⚡- 77.9% ±  2.6%
  branch_misses       119M  ± 3.15M      114M  …  125M           0 ( 0%)        💩+  5.9% ±  1.0%
Benchmark 6 (46 runs): ./document_scope_c769e596319341ff285fd4f82b0be3fa50e6eb54 # remove the start_token parameter from makeScopeAt
  measurement          mean ± σ            minmax           outliers         delta
  wall_time          2.63s  ± 28.1ms    2.60s  … 2.72s           4 ( 9%)        ⚡- 28.8% ±  0.4%
  peak_rss           8.20MB ± 59.8KB    8.08MB … 8.31MB          0 ( 0%)        ⚡- 26.6% ±  0.2%
  cpu_cycles         10.2G  ±  108M     10.0G  … 10.5G           3 ( 7%)        ⚡- 21.0% ±  0.5%
  instructions       19.1G  ± 2.05K     19.1G  … 19.1G           0 ( 0%)        ⚡- 21.6% ±  0.0%
  cache_references   46.2M  ±  415K     45.4M  … 47.1M           0 ( 0%)          -  0.7% ±  0.7%
  cache_misses       1.58M  ±  602K      805K  … 3.42M           1 ( 2%)        ⚡- 76.1% ±  3.4%
  branch_misses       108M  ± 2.43M      104M  …  114M           0 ( 0%)        ⚡-  3.4% ±  0.8%

The script I used reads Sema.zig, parses it, then constructs and deinits the document scope 100 times. I should do some more sophisticated benchmarking that only measures everything document scope related but this should serve as a starting point.

@SuperAuguste
Copy link
Member Author

Thanks for the bug fixes! Wow, that perf is pretty damn incredible 🌟

I've also removed the return type and the `keep_block_open` parameter.

this diff is much easier to digest when hiding whitespace changes
@Techatrix
Copy link
Member

Here is the diff when reverting the changes that 1635a21 made to ast.lastToken:

Benchmark 1 (46 runs): ./document_scope_c769e596319341ff285fd4f82b0be3fa50e6eb54
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.63s  ± 40.9ms    2.58s  … 2.79s           4 ( 9%)        0%
  peak_rss           8.17MB ± 59.3KB    8.08MB … 8.31MB          0 ( 0%)        0%
  cpu_cycles         10.1G  ±  156M     9.95G  … 10.8G           4 ( 9%)        0%
  instructions       19.1G  ± 1.95K     19.1G  … 19.1G           0 ( 0%)        0%
  cache_references   46.3M  ±  421K     45.4M  … 47.7M           1 ( 2%)        0%
  cache_misses       2.26M  ±  362K     1.66M  … 3.38M           1 ( 2%)        0%
  branch_misses       104M  ±  615K      103M  …  106M           0 ( 0%)        0%
Benchmark 2 (54 runs): ./document_scope_new
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.23s  ± 42.1ms    2.19s  … 2.41s           4 ( 7%)        ⚡- 15.1% ±  0.6%
  peak_rss           8.19MB ± 49.0KB    8.08MB … 8.32MB          0 ( 0%)          +  0.2% ±  0.3%
  cpu_cycles         8.57G  ±  160M     8.44G  … 9.26G           3 ( 6%)        ⚡- 15.1% ±  0.6%
  instructions       14.3G  ± 2.10K     14.3G  … 14.3G           0 ( 0%)        ⚡- 25.0% ±  0.0%
  cache_references   47.2M  ±  531K     46.3M  … 49.2M           2 ( 4%)        💩+  1.9% ±  0.4%
  cache_misses       2.16M  ±  330K     1.52M  … 2.94M           0 ( 0%)          -  4.3% ±  6.1%
  branch_misses      80.9M  ±  650K     79.7M  … 83.2M           1 ( 2%)        ⚡- 22.6% ±  0.2%

Now I just have to find a more optimized solution. 🤔

@Techatrix
Copy link
Member

New improvements:

Benchmark 1 (33 runs): ./document_scope_master
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          3.72s  ±  160ms    3.65s  … 4.60s           1 ( 3%)        0%
  peak_rss           11.2MB ± 60.3KB    11.1MB … 11.3MB          0 ( 0%)        0%
  cpu_cycles         12.9G  ±  613M     12.7G  … 16.3G           1 ( 3%)        0%
  instructions       24.3G  ±  773      24.3G  … 24.3G           0 ( 0%)        0%
  cache_references   47.0M  ± 1.08M     45.2M  … 49.1M           0 ( 0%)        0%
  cache_misses       6.84M  ±  336K     6.33M  … 7.50M           0 ( 0%)        0%
  branch_misses       112M  ± 1.06M      110M  …  113M           0 ( 0%)        0%
Benchmark 2 (46 runs): ./document_scope_c769e596319341ff285fd4f82b0be3fa50e6eb54 # remove the start_token parameter from makeScopeAt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.62s  ± 33.6ms    2.57s  … 2.76s           3 ( 7%)        ⚡- 29.6% ±  1.3%
  peak_rss           8.19MB ± 55.9KB    8.08MB … 8.31MB          0 ( 0%)        ⚡- 26.7% ±  0.2%
  cpu_cycles         10.1G  ±  127M     9.97G  … 10.6G           3 ( 7%)        ⚡- 21.7% ±  1.4%
  instructions       19.1G  ± 2.25K     19.1G  … 19.1G           0 ( 0%)        ⚡- 21.6% ±  0.0%
  cache_references   46.2M  ±  430K     45.1M  … 47.0M           0 ( 0%)        ⚡-  1.8% ±  0.7%
  cache_misses       1.25M  ±  421K      661K  … 2.29M           0 ( 0%)        ⚡- 81.7% ±  2.6%
  branch_misses       108M  ± 2.27M      104M  …  112M           0 ( 0%)        ⚡-  3.4% ±  0.8%
Benchmark 3 (55 runs): ./document_scope_27562655530923b178f37664c05ab15a11c26645 # optimize `ast.lastToken` on blocks, containers and switch expressions
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.22s  ± 18.9ms    2.19s  … 2.27s           0 ( 0%)        ⚡- 40.4% ±  1.2%
  peak_rss           8.19MB ± 44.5KB    8.14MB … 8.32MB          0 ( 0%)        ⚡- 26.8% ±  0.2%
  cpu_cycles         8.57G  ± 72.6M     8.44G  … 8.75G           0 ( 0%)        ⚡- 33.6% ±  1.3%
  instructions       14.4G  ± 1.65K     14.4G  … 14.4G           0 ( 0%)        ⚡- 40.9% ±  0.0%
  cache_references   48.3M  ±  683K     47.1M  … 50.4M           2 ( 4%)        💩+  2.7% ±  0.8%
  cache_misses       1.24M  ±  359K      785K  … 2.34M           4 ( 7%)        ⚡- 81.9% ±  2.2%
  branch_misses      85.4M  ± 1.97M     81.8M  … 91.0M           0 ( 0%)        ⚡- 23.4% ±  0.7%
Benchmark 4 (58 runs): ./document_scope_4ff74787335ca68f4618efe4e4084ca43e136f85 # optimize `offsets.tokenToLoc` on identifiers
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.08s  ± 28.3ms    2.04s  … 2.21s           3 ( 5%)        ⚡- 44.0% ±  1.1%
  peak_rss           8.19MB ± 60.8KB    8.08MB … 8.32MB          0 ( 0%)        ⚡- 26.7% ±  0.2%
  cpu_cycles         8.04G  ±  110M     7.89G  … 8.53G           3 ( 5%)        ⚡- 37.7% ±  1.3%
  instructions       13.2G  ± 1.52K     13.2G  … 13.2G           0 ( 0%)        ⚡- 45.8% ±  0.0%
  cache_references   47.0M  ±  515K     46.1M  … 48.0M           0 ( 0%)          -  0.1% ±  0.7%
  cache_misses       1.39M  ±  361K      799K  … 2.23M           0 ( 0%)        ⚡- 79.7% ±  2.2%
  branch_misses      81.5M  ± 1.41M     77.8M  … 84.5M           0 ( 0%)        ⚡- 26.9% ±  0.5%

@Techatrix
Copy link
Member

Benchmark 1 (33 runs): ./document_scope_master
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          3.67s  ± 23.5ms    3.64s  … 3.74s           2 ( 6%)        0%
  peak_rss           11.1MB ± 64.1KB    10.9MB … 11.2MB          0 ( 0%)        0%
  cpu_cycles         12.8G  ± 99.1M     12.6G  … 13.0G           1 ( 3%)        0%
  instructions       24.3G  ±  689      24.3G  … 24.3G           2 ( 6%)        0%
  cache_references   45.7M  ±  848K     44.5M  … 47.9M           1 ( 3%)        0%
  cache_misses       6.36M  ±  284K     5.89M  … 6.92M           0 ( 0%)        0%
  branch_misses       111M  ± 1.20M      110M  …  114M           0 ( 0%)        0%
Benchmark 2 (58 runs): ./document_scope_4ff74787335ca68f4618efe4e4084ca43e136f85
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.09s  ± 28.5ms    2.05s  … 2.17s           2 ( 3%)        ⚡- 43.2% ±  0.3%
  peak_rss           8.20MB ± 53.8KB    8.09MB … 8.32MB          0 ( 0%)        ⚡- 25.9% ±  0.2%
  cpu_cycles         8.07G  ±  110M     7.91G  … 8.38G           2 ( 3%)        ⚡- 36.9% ±  0.4%
  instructions       13.2G  ± 1.70K     13.2G  … 13.2G           0 ( 0%)        ⚡- 45.9% ±  0.0%
  cache_references   47.2M  ±  552K     46.4M  … 49.3M           2 ( 3%)        💩+  3.3% ±  0.6%
  cache_misses       1.53M  ±  800K      628K  … 3.37M           0 ( 0%)        ⚡- 75.9% ±  4.5%
  branch_misses      82.2M  ± 1.39M     78.8M  … 84.7M           0 ( 0%)        ⚡- 26.2% ±  0.5%
Benchmark 3 (66 runs): ./document_scope_c75bc2c29608f2beba44cab5b88a40ddfe00fa9c
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.82s  ± 13.4ms    1.79s  … 1.88s           3 ( 5%)        ⚡- 50.4% ±  0.2%
  peak_rss           8.19MB ± 57.8KB    8.08MB … 8.32MB          6 ( 9%)        ⚡- 26.0% ±  0.2%
  cpu_cycles         7.04G  ± 51.3M     6.92G  … 7.25G           3 ( 5%)        ⚡- 44.9% ±  0.2%
  instructions       9.86G  ± 2.01K     9.86G  … 9.86G           0 ( 0%)        ⚡- 59.5% ±  0.0%
  cache_references   45.4M  ±  327K     44.7M  … 46.3M           2 ( 3%)          -  0.7% ±  0.5%
  cache_misses       1.22M  ±  462K      651K  … 2.53M           8 (12%)        ⚡- 80.9% ±  2.7%
  branch_misses      79.4M  ± 1.27M     75.9M  … 81.0M           3 ( 5%)        ⚡- 28.7% ±  0.5%

I've ran the following hyperfine benchmark on a different linux distro where I also had no desktop environment running which may explain why its faster.

Command Mean [s] Min [s] Max [s] Relative
./document_scope_master 2.962 ± 0.055 2.922 3.461 1.69 ± 0.04
./document_scope_4ff74787335ca68f4618efe4e4084ca43e136f85 1.998 ± 0.028 1.937 2.098 1.14 ± 0.02
./document_scope_c75bc2c29608f2beba44cab5b88a40ddfe00fa9c 1.750 ± 0.024 1.716 1.869 1.00

@Techatrix
Copy link
Member

I'm done optimizing for now. That ⚡- 50.4% is luking prity gud.

@SuperAuguste
Copy link
Member Author

Great stuff @Techatrix! Could you approve and merge this, it definitely seems ready :)

@SuperAuguste
Copy link
Member Author

yolo

@SuperAuguste SuperAuguste merged commit 645f728 into master Oct 22, 2023
@SuperAuguste SuperAuguste deleted the doc-scope-dod branch October 22, 2023 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants