Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate all DB tables away from MessagePack #308

Merged
merged 3 commits into from
Aug 1, 2024
Merged

Conversation

LucasPickering
Copy link
Owner

@LucasPickering LucasPickering commented Aug 1, 2024

Description

Describe the change. If there is an associated issue, please include the issue link (e.g. "Closes #xxx"). For UI changes, please also include screenshots.

Previously, MessagePack was used extensively in the internal Slumber DB to store complex values (requests, responses, UI state, etc.). This seemed like a good idea at the time because it allowed me to jam complex data into single DB columns without having to write migrations when adding fields. I chose MessagePack over JSON because it was more compact. Over time though I've realized storing the data as big binary blobs makes it hard to debug, and also adds an extra package (rmp_serde) to the dep tree.

This MR makes 3 major changes:

  • Add a new table requests_v2, which stores all request and response fields as columns. This will make it easier to browse data now, and migrate data in the future. I'm sure there's also a performance benefit to skipping the extra serde layer
  • Add a new table ui_state_v2 that stores data in JSON instead of MessagePack. JSON will be a bit less compact, but is can be easily read in the CLI for debugging, and the size difference will be minimal.
  • Convert the column collections.path from MessagePack to UTF-8. This could potential cause errors, but in practice should be fine because who the hell is writing paths that aren't UTF-8?? Mitigated with tests that contain UTF-8 paths, which pass on all platforms.

I wrote migrations for all 3 so there should be no discernible impact to the user. The plan is to remove these migrations at some point in the future so we can drop the rmp_serde dependency (#306). It'll also allow us to remove some serde derive impls, which should improve compile time marginally. Right now I'm thinking ~6 months from now. That will necessitate a major version update.

Known Risks

What issues could potentially go wrong with this change? Is it a breaking change? What have you done to mitigate any potential risks?

There's potential for data loss here. Mitigated that by leaving the requests table around. This means we'll have duplicate data which wastes disk space. We can drop that table in the future once the migration has been tested more.

QA

How did you test this?

Manually tested migrations locally, added unit tests for each one.

Checklist

  • Have you read CONTRIBUTING.md already?
  • Did you update CHANGELOG.md?
    • Only user-facing changes belong in the changelog. Internal changes such as refactors should only be included if they'll impact users, e.g. via performance improvement.
  • Did you remove all TODOs?
    • If there are unresolved issues, please open a follow-on issue and link to it in a comment so future work can be tracked

This usurps the existing requests and ui_state tables. For requests_v2, iIt flattens all fields into DB columns. This makes it easier to browse and query data, and it also makes it easier to migrate in the future. Previously if we wanted to add a required field to RequestRecord or ResponseRecord, we would've had to keep an old version of the struct around to migrate. Now we can just add a DB column like a normal migration.

For ui_state_v2, it flattens the type name into its own column, and replaces MessagePack with JSON, since we have to pull in serde_json anyway. The JSON will also be easy to browse.

We have to keep the old schema, including rmp_serde and the Deserialize impls, around to support migrating. At some point in the future we can abandon those, but that should be after a reasonable amount of time once users have mostly migrated to the new version. I also left the old requests table around in case something goes wrong during a migration, so data isn't lost.
Previously the collection path was serialized as messagepack. This means it was stored as platform-specific bytes, with a header. This changes it to be stored just as UTF-8, which in practice should be fine for all cases. It eliminates rmp_serde as a dependency (except for migrations), and makes the logic much simpler.
@LucasPickering LucasPickering merged commit a34d16b into master Aug 1, 2024
9 checks passed
@LucasPickering LucasPickering deleted the great-migration branch August 1, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant