-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise C# in a ridiculous way #1924
Conversation
Clickbait title, but there is no other way to say this. It must be the weirdest way I optimised anything yet. I was suspicious that a lot of benchmarks, in particular those that deal with iteration - which we have for C# as of recently - take about the same amount of time. After digging in for a bit, I narrowed it down to merely presence of this destructor. Looks like the way destructors are implemented for the Wasm target somehow deoptimises the execution to the point where simply removing it speeds up benchmarks by 15x or more. We don't really need it anyway, it was more of a nice-to-have in case user does something weird, since regular iterator usage like `foreach` always calls `Dispose()` correctly. Further investigation and an upstream issue to .NET coming up, but meanwhile here are benchmark differences before/after: | benchmark | improvement | csharp_time | csharp_rate | csharp_no_destructor_time | csharp_no_destructor_rate | | ----------------------------------------------------------------------------- | ----------- | ----------- | ------------ | ------------------------- | ------------------------- | | special/db_game/circles/load=10 | 14.58 | 3.3±0.24ms | ? ?/sec | 224.1±21.57µs | ? ?/sec | | special/db_game/circles/load=100 | 14.93 | 3.3±0.20ms | ? ?/sec | 223.4±16.61µs | ? ?/sec | | stdb_module/mem/filter/string/index/load=2048/count=256 | 15.79 | 3.1±0.11ms | 321 Elem/sec | 197.0±2.97µs | 5.0 KElem/sec | | stdb_module/mem/filter/u64/index/load=2048/count=256 | 19.15 | 3.2±0.11ms | 315 Elem/sec | 165.3±2.86µs | 5.9 KElem/sec | | stdb_module/mem/insert_bulk/u32_u64_u64/btree_each_column/load=2048/count=256 | 2.00 | 3.2±0.11ms | 314 Elem/sec | 1589.3±45.07µs | 629 Elem/sec | | stdb_module/mem/insert_bulk/u32_u64_u64/unique_0/load=2048/count=256 | 16.71 | 3.2±0.10ms | 312 Elem/sec | 191.7±2.98µs | 5.1 KElem/sec | | stdb_module/mem/iterate/u32_u64_str/unique_0/count=256 | 31.72 | 3.1±0.11ms | 325 Elem/sec | 97.0±2.47µs | 10.1 KElem/sec | | stdb_module/mem/iterate/u32_u64_u64/unique_0/count=256 | 34.63 | 3.4±0.11ms | 296 Elem/sec | 97.5±2.40µs | 10.0 KElem/sec | | stdb_module/mem/update_bulk/u32_u64_str/unique_0/load=2048/count=256 | 14.31 | 3.1±0.12ms | 318 Elem/sec | 219.2±7.24µs | 4.5 KElem/sec | | stdb_module/mem/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 | 15.02 | 3.2±0.06ms | 315 Elem/sec | 210.9±2.46µs | 4.6 KElem/sec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you even find this? And to be clear, is it removing the finalizer, or adding the sealed
which has this effect?
It started from digging into why iterators specifically are so slow, making some optimisations, being weirded out by not noticing any improvement, and eventually ended with a "scientific guess" that if it all takes roughly constant amount of time, GC must be involved somehow. And voila, indeed it was.
It's removing the finalizer.
|
Description of Changes
Clickbait title, but there is no other way to say this. It must be the weirdest way I optimised anything yet.
I was suspicious that a lot of benchmarks, in particular those that deal with iteration - which we have for C# as of recently - take about the same amount of time. After digging in for a bit, I narrowed it down to merely presence of this destructor. Looks like the way destructors are implemented for the Wasm target somehow deoptimises the execution to the point where simply removing it speeds up benchmarks by 15x or more.
We don't really need it anyway, it was more of a nice-to-have in case user does something weird, since regular iterator usage like
foreach
always callsDispose()
correctly.Further investigation and an upstream issue to .NET coming up, but meanwhile here are benchmark differences before/after:
API and ABI breaking changes
If this is an API or ABI breaking change, please apply the
corresponding GitHub label.
Expected complexity level and risk
How complicated do you think these changes are? Grade on a scale from 1 to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex change.
This complexity rating applies not only to the complexity apparent in the diff,
but also to its interactions with existing and future code.
If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning ways.
Testing
Describe any testing you've done, and any testing you'd like your reviewers to do,
so that you're confident that all the changes work as expected!