Match datastore semantics more closely: `btree_scan` => `index_scan_range` #2203

Centril · 2025-02-04T00:05:51Z

Description of Changes

In preparation for adding a index algo/kind that we'll call direct (backed by Vec<Vec<RowPointer>>, ...),
this PR moves all the module bindings APIs (Rust + C#) to use datastore_[delete_by_]index_scan_range_bsatn
and deprecates datastore_[delete_by_]btree_scan_bsatn.

This more closely reflects what the datastore actually does today.
Nowhere (mostly because there's only a single index algo) does the datastore actually check that a btree index is used.
What the ABI actually is about is fast range queries.
The new direct index can also support that, so the more general name makes sense,
while it is also beneficial for efficiency, as we can avoid a check.

After this change, there are no users of the deprecated ABIs, so we could elect to remove them just before 1.0,
but this PR leaves them be.

Some datastore types and methods are also renamed to reflect the new reality.

API and ABI breaking changes

None, for now.

Expected complexity level and risk

2, mostly simple renaming and code motion.

Testing

Existing tests should cover this.

…ange`

joshua-spacetime

What the ABI actually is about is fast range queries.
The new direct index can also support that,

Will there be logic for choosing which index to use within the current syscall?

crates/core/src/host/instance_env.rs

Centril · 2025-02-04T00:25:14Z

Will there be logic for choosing which index to use within the current syscall?

The existing syscall datastore_btree_scan_bsatn will become a deprecated name for datastore_index_scan_range_bsatn,
which won't have any logic beyond generically dispatching index_id to the known index with that id, as is currently done for the different variations of TypedIndexes that we have.
The MutTxId::index_scan_range implementation should not change at all in upcoming PRs.

cloutiertyler · 2025-02-04T02:03:10Z

What the ABI actually is about is fast range queries.

I would dispute this point, the ABI is actually about using an index to perform a range query. The host must validate that the particular type of index specified by index_id can support the index operation being made.

I think this is an important point. Not all indexes will be able to support all index operations.

Also NIT, let's not call them queries, they're datastore operations.

while it is also beneficial for efficiency, as we can avoid a check.

I don't think you can because if we add a hash index we need to validate that that you did not pass in the index_id of a hash index to this function.

cloutiertyler · 2025-02-04T02:10:12Z

TBH I'm wavering about this renaming scheme since those points were muddled in the short period between our conversation and merging this PR. I don't think we should revert it necessarily, but I think it is very very important that we're clear that these are physical datastore operations, and the query engine should always be able to choose the specific physical operation it wants to do.

joshua-spacetime · 2025-02-04T02:27:43Z

I think it is very very important that we're clear that these are physical datastore operations, and the query engine should always be able to choose the specific physical operation it wants to do.

This was my main concern, and I approved based on the confirmation from @Centril that these are indeed remaining purely physical operations.

Centril · 2025-02-04T09:53:48Z

I would dispute this point, the ABI is actually about using an index to perform a range query. The host must validate that the particular type of index specified by index_id can support the index operation being made.

I'm not sure what the distinction is; I think we're saying the same thing, that this ABI affords range scans via an index, specifically for those indices where we think it is reasonable to expose range scan functionality. For example, you can do a range scan via a hash index, but it's a bad idea to expose such a facility, so we won't.

I think this is an important point. Not all indexes will be able to support all index operations.

We're aligned. :) A datastore_index_scan_point_bsatn, which I think we should add for efficiency, would be supported by hash, btree, direct, whereas _range_ would only be supported by ``btree, direct`.

I don't think you can because if we add a hash index we need to validate that that you did not pass in the index_id of a hash index to this function.

What I meant was that we're avoiding a check until such time as we add a hash index, which might not be necessary, based on the the surprising perf outcomes.

[...], but I think it is very very important that we're clear that these are physical datastore operations, and the query engine should always be able to choose the specific physical operation it wants to do.

Agreed. The query engine will retain full control over what index/operation it wants to use with this PR merged.

gefjon · 2025-02-04T14:30:30Z

Two cents here: the host function is going to have to dispatch on the index type no matter what. If we have separate btree_scan vs dense_seq_vector_scan methods, then each of them checks equality on the index type, and returns an error otherwise. If we have a unified index_scan_range, then it does a switch/case on the index type, and the cases for unordered index types return errors. There's no version of this, given our system table schemas, where you can have a host function btree_scan which immediately does a low-level physical operation without first checking the index type.

match datastore semantics more closely: btree_scan => `index_scan_r…

468f4dc

…ange`

Centril requested review from gefjon and coolreader18 February 4, 2025 00:07

joshua-spacetime reviewed Feb 4, 2025

View reviewed changes

crates/core/src/host/instance_env.rs Show resolved Hide resolved

joshua-spacetime approved these changes Feb 4, 2025

View reviewed changes

Centril added this pull request to the merge queue Feb 4, 2025

Merged via the queue into master with commit 2fc0361 Feb 4, 2025
14 checks passed

Centril deleted the centril/datastore_index_scan_range branch February 4, 2025 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match datastore semantics more closely: `btree_scan` => `index_scan_range` #2203

Match datastore semantics more closely: `btree_scan` => `index_scan_range` #2203

Centril commented Feb 4, 2025

joshua-spacetime left a comment

Centril commented Feb 4, 2025

cloutiertyler commented Feb 4, 2025 •

edited

Loading

cloutiertyler commented Feb 4, 2025 •

edited

Loading

joshua-spacetime commented Feb 4, 2025

Centril commented Feb 4, 2025

gefjon commented Feb 4, 2025

Match datastore semantics more closely: btree_scan => index_scan_range #2203

Match datastore semantics more closely: btree_scan => index_scan_range #2203

Conversation

Centril commented Feb 4, 2025

Description of Changes

API and ABI breaking changes

Expected complexity level and risk

Testing

joshua-spacetime left a comment

Choose a reason for hiding this comment

Centril commented Feb 4, 2025

cloutiertyler commented Feb 4, 2025 • edited Loading

cloutiertyler commented Feb 4, 2025 • edited Loading

joshua-spacetime commented Feb 4, 2025

Centril commented Feb 4, 2025

gefjon commented Feb 4, 2025

Match datastore semantics more closely: `btree_scan` => `index_scan_range` #2203

Match datastore semantics more closely: `btree_scan` => `index_scan_range` #2203

cloutiertyler commented Feb 4, 2025 •

edited

Loading

cloutiertyler commented Feb 4, 2025 •

edited

Loading