Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB Versioning & Migration Tools (GSI-1462) #159

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

TheByronHimes
Copy link
Member

@TheByronHimes TheByronHimes commented Mar 6, 2025

This is the official implementation of the DB migration prototype implemented for the IFRS/DCS in the file services monorepo.
Most of the functionality is the same, but some small improvements/tweaks have been added.

PR Info

Changes from prototype

Batch Processing:
migrate_docs_in_collection() now takes a batch_size parameter to control the batch size used by the cursor. When inserting the migrated documents, we now call .insert_many() instead of making sequential calls to .insert(), and the size of the inserts is also controlled by batch_size. The batch_size parameter defaults to 1000. Note that MongoDB will only return batches of up to 16 MB, so the batch size only limits the number of documents retrieved within that limit (the default for MongoDB is 101).

Index Copying:
copy_indexes() is now functional, and will copy the indexes between two collections directly. There's an automatic/auto-tracking version called auto_copy_indexes() that accepts the normal collection names and handles prefixing and remembers which collections have had the indexes copied (similar to the staging process). Under the hood, auto_copy_indexes() calls copy_indexes().

Configurable Timeout for Waiting Instances:
In the case where multiple service instances exist and all but one must wait for the migration process to finish, the prototype provided no way for those services to timeout. This PR adds a config option, migration_max_wait_sec. If set, waiting services will raise a MigrationTimeoutError. The value is optional, though, and setting it to None will allow for infinite waiting.

Timestamp Format:
Timestamps for the DB Version Records and the lock document are now UTC Datetimes, while the prototype used strings.
There's a small hiccup with this since the IFRS and DCS already inserted a few documents with the string format, but we will prevail.

Tests for the Migration Tools

Currently, the following have been added:

  • Test for initialization (completing the v1 migration)
  • Test for behavior in the case that a nonexistent/empty collection is dropped or renamed
  • Test copying indexes to make sure names/keys/options are copied over
  • Test to ensure existing indexes are not copied unless instructed
  • Test behavior of staging and subsequently unstaging a collection
  • Test error handling when unapply is not defined on a migration definition while migrating down
  • Test migration unapply success
  • Test batch processing when collection size is larger than batch_size to ensure all docs are migrated
  • Migration unapply failure due to model mismatch
  • Migration idempotence (should be able to run migration check a bunch with no effect if already up to date)
  • Functionality of the auto_finalize() context manager in successful and error cases.

I did some minor organization of the integration test modules, but did not change the unit tests since it had a more heterogeneous mix of provider/protocol tests rather than provider-specific tests.

Items implemented from prototype wishlist:
✅ Batch processing
✅ Index copying (need to understand this better)
✅ Better documentation
✅ Limits for retries?
✅ Testing to cover most the edge cases along with unit testing to better catch problems with helper functions

Delayed or Scratched:
⏲️ Better logging/error handling (did some small changes, but mostly left it as is)
⏲️ Better metrics (e.g. # documents affected)
Reversible subclass thing alternate implementation
automatically pass config to instances of MigrationDefinition - only an inconvenience for testing though


Usage (Bonus Info)

Step 1: The logic for every migration should subclass MigrationDefinition and, usually, Reversible as well:

from hexkit.providers.mongodb.migrations import MigrationDefinition, Reversible

class V2Migration(MigrationDefinition, Reversible):
    """Adds `object_size` and migrates the database to version 2"""

    version = 2

    async def apply(self):
        # migrate collection 1
        # migrate collection 2, etc.

    async def unapply(self):
        # logic to undo the migration, only if subclassing Reversible

The naming convention is V<version>Migration. Migrations occur at the database level, meaning there should only be one migration class for every version of the database -- each migration class should migrate all pertinent collections within the database:

✅ DO ❌ DON'T
V4Migration V4UsersMigration, MySuperRadMigrationClass

Step 2: Define the current database version and the migration map.

Version 1 is automatically applied by hexkit when initializing the database versioning. As a result, actual migrations begin with version 2.

from my_migrations import V2Migration, V3Migration, V4Migration  # etc.

MIGRATION_MAP = {2: V2Migration, 3: V3Migration, 4: V4Migration}
DB_VERSION = 4

Step 3: Instantiate the MigrationManager via async with, then call .migrate_or_wait():

MIGRATION_MAP = {2: V2Migration, 3: V3Migration, 4: V4Migration}
DB_VERSION = 4

async def run_db_migrations(*, config: Config, target_version: int = DB_VERSION):
    """Check if the database is up to date and attempt to migrate the data if not."""
    async with MigrationManager(
        config=config,
        target_version=target_version,
        migration_map=MIGRATION_MAP,
    ) as mm:
        await mm.migrate_or_wait()

Step 4: Perform the DB migrations/version check sometime before firing up the actual service code:

async def run_rest():
    """Run an event consumer listening to the specified topics."""
    config = Config()
    configure_logging(config=config)
    await run_db_migrations(config=config)
    
    async with prepare_rest_app(config=config) as app:
        await run_server(app=app, config=config)

Notes

The MigrationDefinition class defines some useful logic for common operations, such as "staging" migrated data (where we rename the collections to swap the new data with the old), dropping new collections, generating temp table prefixing, applying an update function to each doc in a collection with validation, and a context manager to automate staging/dropping/error case cleanup.

The MigrationManager handles higher-level responsibilities and is ignorant of migration implementation details. It cares about whether or not the database is set up for migrations, which migrations it needs to run to get from version X to version Y, and running said migrations. It handles the database "lock" document to ensure only one service instance ever performs migrations. It also maintains a record of completed migrations, which is stored in the configured db_version_collection.

Solving the catch-22 of making sure only one instance initializes the lock/versioning collection is straightforward: the code inserts a lock document with a fixed ID. If the result is a DuplicateKeyError error, stop and enter the waiting cycle (another instance was faster), otherwise continue with the init & migration logic. The service instance that performs the initialization will already have the lock acquired after finishing setup and can roll right into executing any migrations.

@coveralls
Copy link

coveralls commented Mar 7, 2025

Pull Request Test Coverage Report for Build 13817926961

Details

  • 300 of 339 (88.5%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.5%) to 91.752%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/hexkit/providers/mongodb/migrations/_utils.py 140 155 90.32%
src/hexkit/providers/mongodb/migrations/_manager.py 157 181 86.74%
Totals Coverage Status
Change from base Build 13632953372: -0.5%
Covered Lines: 2325
Relevant Lines: 2534

💛 - Coveralls

@TheByronHimes TheByronHimes requested a review from Cito March 10, 2025 12:30
@TheByronHimes TheByronHimes marked this pull request as ready for review March 10, 2025 12:33
Cito
Cito previously approved these changes Mar 10, 2025
Copy link
Member

@Cito Cito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Made some suggestions.

Co-authored-by: Christoph Zwerschke <[email protected]>
@TheByronHimes TheByronHimes requested review from Cito and mephenor March 11, 2025 13:33
Cito
Cito previously approved these changes Mar 11, 2025
Comment on lines +336 to +339
error = MigrationStepError(
current_ver=version - 1,
target_ver=self.target_ver,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does current_ver have the correct value for both backward and forward here? Shouldn't it simply be version for unapplying?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be right. I'll add a small test for it

Comment on lines +298 to +299
index.pop("v", "")
index.pop("ns", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Co-authored-by: Thomas Zajac <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants