Don't create indexes during bootstrapping; wait until after replay #2161

gefjon · 2025-01-22T21:17:12Z

Description of Changes

This PR fixes a bug where replaying databases without a snapshot of TX 0, i.e. using the entire commitlog,
would fail when replaying deletes on st_client.

This is because st_client was in the unique (*) position of:

Being a system table,
with a unique index,
which is mutated after bootstrapping.

(*) Other system tables meet all three criteria in edge cases with schema-changing publishes, but only st_client meets them in everyday testing.

In this case, when replaying the entire commitlog, we create the indexes on the system tables during bootstrapping before replay. (I theorize, but have not checked,
that when restoring from a snapshot we do not create these indexes so eagerly, instead waiting for after replaying the commitlog suffix to create them.)

However, we cannot update or use indexes during bootstrapping, at least currently, because doing so causes unique constraint violations. Prior to this commit, we had a bunch of complicated code which avoided updating indexes during replay.

PR #2092, aka commit f667f65, introduced a case where we would use an index during replay: when replaying a delete for a table which had an index, and therefore had no pointer map.
Prior to that commit, we would never use an index during replay, even if it existed.

This all came together to mean that st_client had an empty index, and no pointer map, and so calls to replay_delete_by_rel on that table would always fail. I theorize that no one noticed this sooner
because it did not manifest when restoring from a snapshot, and there is always a snapshot of TX 0 unless you manually delete it, as Kim did during his testing.

The fix is simple: we do not construct indexes on the system tables during bootstrapping before replaying the commitlog, instead waiting until after replay like we do for user-defined indexes. This allows us to rip out all the aforementioned fiddly code for dodging indexes during replay,
since now there will not be any indexes and therefore we will never update them.

This PR also includes some new testing infra in relational_db.rs, and exports some test helpers from the commitlog crate under its test feature.

API and ABI breaking changes

N/a

Expected complexity level and risk

4: I don't feel like I understand our bootstrapping process terribly well, it is quite fiddly, and I cannot explain why we were previously constructing system table indexes before replay. It may have been important in some way I do not realize.

Testing

Replayed a commitlog containing a conn/disconn pair after deleting snapshot 0, and saw that it successfully reconstructed the (or at least an) in-memory state.
- I did not do any more work to verify that the reconstructed state was correct.
Did the same thing with a commitlog with snapshot 0 still intact, to make sure I didn't somehow break that.
Wrote a cool new regression test, replay_delete_from_st_client, which fails on master but passes on this branch.

This PR fixes a bug where replaying databases without a snapshot of TX 0, i.e. using the entire commitlog, would fail when replaying deletes on `st_client`. This is because `st_client` was in the unique (*) position of: - Being a system table, - with a unique index, - which is mutated after bootstrapping. (*) Other system tables meet all three criteria in edge cases with schema-changing publishes, but only `st_client` meets them in everyday testing. In this case, when replaying the entire commitlog, we create the indexes on the system tables during bootstrapping before replay. (I theorize, but have not checked, that when restoring from a snapshot we do not create these indexes so eagerly, instead waiting for after replaying the commitlog suffix to create them.) However, we cannot update or use indexes during bootstrapping, at least currently, because doing so causes unique constraint violations. Prior to this commit, we had a bunch of complicated code which avoided updating indexes during replay. PR #2092, aka commit f667f65, introduced a case where we would use an index during replay: when replaying a delete for a table which had an index, and therefore had no pointer map. Prior to that commit, we would never use an index during replay, even if it existed. This all came together to mean that `st_client` had an empty index, and no pointer map, and so calls to `replay_delete_by_rel` on that table would always fail. I theorize that no one noticed this sooner because it did not manifest when restoring from a snapshot, and there is always a snapshot of TX 0 unless you manually delete it, as Kim did during his testing. The fix is simple: we do not construct indexes on the system tables during bootstrapping before replaying the commitlog, instead waiting until after replay like we do for user-defined indexes. This allows us to rip out all the aforementioned fiddly code for dodging indexes during replay, since now there will not be any indexes and therefore we will never update them.

cloutiertyler · 2025-01-22T21:30:48Z

However, we cannot update or use indexes during bootstrapping, at least currently, because doing so causes unique constraint violations.
...
This all came together to mean that st_client had an empty index, and no pointer map, and so calls to replay_delete_by_rel on that table would always fail.

I don't understand why that follows from this, could you explain? In particular how is the double delete thing we talked about relevant?

Centril

Looks good; I wonder if we could add a regression unit test using something like TestDB::in_memory()

Centril · 2025-01-22T21:34:18Z

I don't understand why that follows from this, could you explain? In particular how is the double delete thing we talked about relevant?

If you look at the commitlog that @gefjon posted in #bugs, there is actually no double delete. There's just the one delete for the inserted row in st_clients, but as the index is not populated, the datastore cannot find the row.

gefjon · 2025-01-22T22:35:53Z

Mazdak is correct; there is only one delete, for a row which is present, but is impossible to find because its table has no pointer map.

gefjon · 2025-01-22T22:40:08Z

More specifically, as of #2092 , delete_equal_row (called by replay_delete_by_rel) has two branches: one for tables with a unique constraint, and the other for tables with a pointer map. (Prior to that PR, it always used the pointer map, as all tables had pointer maps.) The bug is that delete_equal_row during replay on st_client would see a unique index and use it, but that unique index was unpopulated, so the delete_equal_row call would fail. This appears to the datastore like deleting of a not-present row, even though the row is present in the table. Doing a full scan followed by a delete-by-pointer would succeed, but we don't do that because replay is slow enough already even without doing a bunch of full scans.

kim · 2025-01-23T08:47:25Z

why we were previously constructing system table indexes before replay

As far as I can tell, this has been there since inception of the locking datastore and could have been removed with the introduction of rebuild_state_after_replay in d8576bf

kim · 2025-01-23T08:51:23Z

As a follow-up, we could also consider to remove the code which lets empty connect/disconnect transactions be committed to the commitlog, because there are no actually empty transactions?

crates/commitlog/src/tests/helpers.rs

…ing; wait until after replay (#2161)

gefjon added the bugfix Fixes something that was expected to work differently label Jan 22, 2025

gefjon requested review from kim, Centril and cloutiertyler January 22, 2025 21:17

gefjon self-assigned this Jan 22, 2025

Centril approved these changes Jan 22, 2025

View reviewed changes

kim approved these changes Jan 23, 2025

View reviewed changes

gefjon mentioned this pull request Jan 23, 2025

Update violates set semantics #2134

Closed

Add test with hand-written commitlog

11444c2

kim reviewed Jan 23, 2025

View reviewed changes

crates/commitlog/src/tests/helpers.rs Show resolved Hide resolved

gefjon added 2 commits January 23, 2025 14:08

Add doc comment to fill_log_with

3de700c

Rework fill_log_with to commit after each TX

4aaa315

Centril approved these changes Jan 23, 2025

View reviewed changes

gefjon enabled auto-merge January 23, 2025 19:24

gefjon added this pull request to the merge queue Jan 23, 2025

Merged via the queue into master with commit d171b44 Jan 23, 2025
13 checks passed

bfops pushed a commit that referenced this pull request Feb 4, 2025

[release/v1.0.0-rc3-hotfixes]: Don't create indexes during bootstrapp…

9135d46

…ing; wait until after replay (#2161)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't create indexes during bootstrapping; wait until after replay #2161

Don't create indexes during bootstrapping; wait until after replay #2161

gefjon commented Jan 22, 2025 •

edited

Loading

cloutiertyler commented Jan 22, 2025

Centril left a comment

Centril commented Jan 22, 2025

gefjon commented Jan 22, 2025

gefjon commented Jan 22, 2025

kim commented Jan 23, 2025

kim commented Jan 23, 2025

Don't create indexes during bootstrapping; wait until after replay #2161

Don't create indexes during bootstrapping; wait until after replay #2161

Conversation

gefjon commented Jan 22, 2025 • edited Loading

Description of Changes

API and ABI breaking changes

Expected complexity level and risk

Testing

cloutiertyler commented Jan 22, 2025

Centril left a comment

Choose a reason for hiding this comment

Centril commented Jan 22, 2025

gefjon commented Jan 22, 2025

gefjon commented Jan 22, 2025

kim commented Jan 23, 2025

kim commented Jan 23, 2025

gefjon commented Jan 22, 2025 •

edited

Loading