Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add direct indices to datastore #2221

Merged
merged 10 commits into from
Feb 7, 2025
Merged

Conversation

Centril
Copy link
Contributor

@Centril Centril commented Feb 6, 2025

Description of Changes

The PR is best reviewed commit by commit, as they build upon the previous one and do one thing.

  1. Adds a note re. renaming BTreeIndex -- later fixed in 6-7.

  2. Adds direct indices to the datastore.

  3. Adds micro-benchmarks for some known index implementations, including our unique btrees and direct indices.

  4. Renames BTreeIndex::seek to BTreeIndex::seek_range and consequence changes.

  5. Adds BTreeIndex::seek_point, which makes point scans about 12% faster in micro-benchmarks. This is then employed in the various places where we currently just use &AlgebraicValue as a bound. More can be done here however in follow ups.

6-7. Rename BTreeIndex to TableIndex to reflect that it isn't limited to btrees anymore.

  1. Fixes a small typo bug in bindings-macro, necessary to actually use #[index(direct)].

With #[index(direct)] applied to tables Position and Velocity in the update benchmark, we get an average 357ms, which is in line with what we found when we hacked in the index implementation.

2025-02-07T00:06:41.784526Z  INFO: : Timing span "update_positions_by_collect": 338.077526ms
2025-02-07T00:06:42.482635Z  INFO: : Timing span "update_positions_by_collect": 360.918827ms
2025-02-07T00:06:43.102741Z  INFO: : Timing span "update_positions_by_collect": 362.031175ms
2025-02-07T00:06:43.698571Z  INFO: : Timing span "update_positions_by_collect": 360.897486ms
2025-02-07T00:06:44.342312Z  INFO: : Timing span "update_positions_by_collect": 361.958901ms

This also improved subscription benchmarks as compared to current master (footprint-semijoin: -22%):

footprint-semijoin      time:   [112.12 µs 112.91 µs 114.08 µs]
                        change: [-22.769% -22.229% -21.549%] (p = 0.00 < 0.05)
                        Performance has improved.

API and ABI breaking changes

None.

Expected complexity level and risk

3, the PR is large, but broken into individual commits that aren't that complicated.
There's also some unsafe in the index implementation.

Testing

Existing tests are amended and some new unit tests are added in the commit that adds direct indices.
The benchmark has also been run, exercising #[index(direct)] in a module.

This implementation uses `Option<Box<[RowPointer; KEYS_PER_INNER]>>` to represent inner indices.
This reduces the space used by the outer index from 24 to 8 bytes,
making randomized inserts -73% faster, but still slow.
Monotonic inserts also get -20% faster.
@Centril Centril force-pushed the centril/direct-index-datastore branch from ce9f6a6 to b95f936 Compare February 6, 2025 23:55
@Centril Centril marked this pull request as ready for review February 7, 2025 00:12
@Centril Centril requested a review from gefjon February 7, 2025 00:13
Copy link
Contributor

@gefjon gefjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have completely ignored your benchmarks during this review; I honestly don't care about them. Also, given our current time constraints, I haven't given this as careful of a review as I would like. But it all looks reasonable, and the concept here is so dirt-simple that it seems hard to do wrong, honestly.

@Centril Centril enabled auto-merge February 7, 2025 00:47
@Centril Centril added this pull request to the merge queue Feb 7, 2025
Merged via the queue into master with commit b50b684 Feb 7, 2025
13 checks passed
@Centril Centril deleted the centril/direct-index-datastore branch February 7, 2025 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants