Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't create indexes during bootstrapping; wait until after replay #2161

Merged
merged 4 commits into from
Jan 23, 2025

Conversation

gefjon
Copy link
Contributor

@gefjon gefjon commented Jan 22, 2025

Description of Changes

This PR fixes a bug where replaying databases without a snapshot of TX 0, i.e. using the entire commitlog,
would fail when replaying deletes on st_client.

This is because st_client was in the unique (*) position of:

  • Being a system table,
  • with a unique index,
  • which is mutated after bootstrapping.

(*) Other system tables meet all three criteria in edge cases with schema-changing publishes, but only st_client meets them in everyday testing.

In this case, when replaying the entire commitlog, we create the indexes on the system tables during bootstrapping before replay. (I theorize, but have not checked,
that when restoring from a snapshot we do not create these indexes so eagerly, instead waiting for after replaying the commitlog suffix to create them.)

However, we cannot update or use indexes during bootstrapping, at least currently, because doing so causes unique constraint violations. Prior to this commit, we had a bunch of complicated code which avoided updating indexes during replay.

PR #2092, aka commit f667f65, introduced a case where we would use an index during replay: when replaying a delete for a table which had an index, and therefore had no pointer map.
Prior to that commit, we would never use an index during replay, even if it existed.

This all came together to mean that st_client had an empty index, and no pointer map, and so calls to replay_delete_by_rel on that table would always fail. I theorize that no one noticed this sooner
because it did not manifest when restoring from a snapshot, and there is always a snapshot of TX 0 unless you manually delete it, as Kim did during his testing.

The fix is simple: we do not construct indexes on the system tables during bootstrapping before replaying the commitlog, instead waiting until after replay like we do for user-defined indexes. This allows us to rip out all the aforementioned fiddly code for dodging indexes during replay,
since now there will not be any indexes and therefore we will never update them.

This PR also includes some new testing infra in relational_db.rs, and exports some test helpers from the commitlog crate under its test feature.

API and ABI breaking changes

N/a

Expected complexity level and risk

4: I don't feel like I understand our bootstrapping process terribly well, it is quite fiddly, and I cannot explain why we were previously constructing system table indexes before replay. It may have been important in some way I do not realize.

Testing

  • Replayed a commitlog containing a conn/disconn pair after deleting snapshot 0, and saw that it successfully reconstructed the (or at least an) in-memory state.
    • I did not do any more work to verify that the reconstructed state was correct.
  • Did the same thing with a commitlog with snapshot 0 still intact, to make sure I didn't somehow break that.
  • Wrote a cool new regression test, replay_delete_from_st_client, which fails on master but passes on this branch.

This PR fixes a bug where replaying databases without a snapshot of TX 0,
i.e. using the entire commitlog,
would fail when replaying deletes on `st_client`.

This is because `st_client` was in the unique (*) position of:
- Being a system table,
- with a unique index,
- which is mutated after bootstrapping.

(*) Other system tables meet all three criteria in edge cases
with schema-changing publishes, but only `st_client` meets them in everyday testing.

In this case, when replaying the entire commitlog,
we create the indexes on the system tables during bootstrapping before replay.
(I theorize, but have not checked,
that when restoring from a snapshot we do not create these indexes so eagerly,
instead waiting for after replaying the commitlog suffix to create them.)

However, we cannot update or use indexes during bootstrapping,
at least currently, because doing so causes unique constraint violations.
Prior to this commit, we had a bunch of complicated code
which avoided updating indexes during replay.

PR #2092, aka commit f667f65,
introduced a case where we would use an index during replay:
when replaying a delete for a table which had an index,
and therefore had no pointer map.
Prior to that commit, we would never use an index during replay, even if it existed.

This all came together to mean that `st_client` had an empty index, and no pointer map,
and so calls to `replay_delete_by_rel` on that table would always fail.
I theorize that no one noticed this sooner
because it did not manifest when restoring from a snapshot,
and there is always a snapshot of TX 0 unless you manually delete it,
as Kim did during his testing.

The fix is simple: we do not construct indexes on the system tables
during bootstrapping before replaying the commitlog,
instead waiting until after replay like we do for user-defined indexes.
This allows us to rip out all the aforementioned fiddly code
for dodging indexes during replay,
since now there will not be any indexes and therefore we will never update them.
@gefjon gefjon added the bugfix Fixes something that was expected to work differently label Jan 22, 2025
@gefjon gefjon self-assigned this Jan 22, 2025
@cloutiertyler
Copy link
Contributor

However, we cannot update or use indexes during bootstrapping, at least currently, because doing so causes unique constraint violations.
...
This all came together to mean that st_client had an empty index, and no pointer map, and so calls to replay_delete_by_rel on that table would always fail.

I don't understand why that follows from this, could you explain? In particular how is the double delete thing we talked about relevant?

Copy link
Contributor

@Centril Centril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; I wonder if we could add a regression unit test using something like TestDB::in_memory()

@Centril
Copy link
Contributor

Centril commented Jan 22, 2025

I don't understand why that follows from this, could you explain? In particular how is the double delete thing we talked about relevant?

If you look at the commitlog that @gefjon posted in #bugs, there is actually no double delete. There's just the one delete for the inserted row in st_clients, but as the index is not populated, the datastore cannot find the row.

@gefjon
Copy link
Contributor Author

gefjon commented Jan 22, 2025

Mazdak is correct; there is only one delete, for a row which is present, but is impossible to find because its table has no pointer map.

@gefjon
Copy link
Contributor Author

gefjon commented Jan 22, 2025

More specifically, as of #2092 , delete_equal_row (called by replay_delete_by_rel) has two branches: one for tables with a unique constraint, and the other for tables with a pointer map. (Prior to that PR, it always used the pointer map, as all tables had pointer maps.) The bug is that delete_equal_row during replay on st_client would see a unique index and use it, but that unique index was unpopulated, so the delete_equal_row call would fail. This appears to the datastore like deleting of a not-present row, even though the row is present in the table. Doing a full scan followed by a delete-by-pointer would succeed, but we don't do that because replay is slow enough already even without doing a bunch of full scans.

@kim
Copy link
Contributor

kim commented Jan 23, 2025

why we were previously constructing system table indexes before replay

As far as I can tell, this has been there since inception of the locking datastore and could have been removed with the introduction of rebuild_state_after_replay in d8576bf

@kim
Copy link
Contributor

kim commented Jan 23, 2025

As a follow-up, we could also consider to remove the code which lets empty connect/disconnect transactions be committed to the commitlog, because there are no actually empty transactions?

@gefjon gefjon enabled auto-merge January 23, 2025 19:24
@gefjon gefjon added this pull request to the merge queue Jan 23, 2025
Merged via the queue into master with commit d171b44 Jan 23, 2025
13 checks passed
bfops pushed a commit that referenced this pull request Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fixes something that was expected to work differently
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants