Skip to content

Files

112 lines (80 loc) · 16 KB

cosmosdb-migrationchoices.md

File metadata and controls

112 lines (80 loc) · 16 KB
title description author ms.author ms.reviewer ms.service ms.topic ms.date
Cosmos DB Migration options
This doc describes the various options to migrate your on-premises or cloud data to Azure Cosmos DB
seesharprun
sidandrews
mjbrown
cosmos-db
how-to
04/02/2022

Options to migrate your on-premises or cloud data to Azure Cosmos DB

[!INCLUDEappliesto-all-apis]

You can load data from various data sources to Azure Cosmos DB. Since Azure Cosmos DB supports multiple APIs, the targets can be any of the existing APIs. The following are some scenarios where you migrate data to Azure Cosmos DB:

  • Move data from one Azure Cosmos container to another container in the same database or a different databases.
  • Moving data between dedicated containers to shared database containers.
  • Move data from an Azure Cosmos account located in region1 to another Azure Cosmos account in the same or a different region.
  • Move data from a source such as Azure blob storage, a JSON file, Oracle database, Couchbase, DynamoDB to Azure Cosmos DB.

In order to support migration paths from the various sources to the different Azure Cosmos DB APIs, there are multiple solutions that provide specialized handling for each migration path. This document lists the available solutions and describes their advantages and limitations.

Factors affecting the choice of migration tool

The following factors determine the choice of the migration tool:

  • Online vs offline migration: Many migration tools provide a path to do a one-time migration only. This means that the applications accessing the database might experience a period of downtime. Some migration solutions provide a way to do a live migration where there is a replication pipeline set up between the source and the target.

  • Data source: The existing data can be in various data sources like Oracle DB2, Datastax Cassanda, Azure SQL Database, PostgreSQL, etc. The data can also be in an existing Azure Cosmos DB account and the intent of migration can be to change the data model or repartition the data in a container with a different partition key.

  • Azure Cosmos DB API: For the SQL API in Azure Cosmos DB, there are a variety of tools developed by the Azure Cosmos DB team which aid in the different migration scenarios. All of the other APIs have their own specialized set of tools developed and maintained by the community. Since Azure Cosmos DB supports these APIs at a wire protocol level, these tools should work as-is while migrating data into Azure Cosmos DB too. However, they might require custom handling for throttles as this concept is specific to Azure Cosmos DB.

  • Size of data: Most migration tools work very well for smaller datasets. When the data set exceeds a few hundred gigabytes, the choices of migration tools are limited.

  • Expected migration duration: Migrations can be configured to take place at a slow, incremental pace that consumes less throughput or can consume the entire throughput provisioned on the target Azure Cosmos DB container and complete the migration in less time.

Azure Cosmos DB SQL API

If you need help with capacity planning, consider reading our guide to estimating RU/s using Azure Cosmos DB capacity planner.

Important

The Custom Migration Service using ChangeFeed is an open-source tool for live container migrations that implements change feed and bulk support. However, please note that the user interface application code for this tool is not supported or actively maintained by Microsoft. For Azure Cosmos DB SQL API live container migrations, we recommend using the Spark Connector + Change Feed as illustrated in the sample below. The Spark Connector for Azure Cosmos DB is fully supported by Microsoft.

Migration type Solution Supported sources Supported targets Considerations
Offline Data Migration Tool •JSON/CSV Files
•Azure Cosmos DB SQL API
•MongoDB
•SQL Server
•Table Storage
•AWS DynamoDB
•Azure Blob Storage
•Azure Cosmos DB SQL API
•Azure Cosmos DB Tables API
•JSON Files
• Easy to set up and supports multiple sources.
• Not suitable for large datasets.
Offline Azure Data Factory •JSON/CSV Files
•Azure Cosmos DB SQL API
•Azure Cosmos DB API for MongoDB
•MongoDB
•SQL Server
•Table Storage
•Azure Blob Storage

See the Azure Data Factory article for other supported sources.
•Azure Cosmos DB SQL API
•Azure Cosmos DB API for MongoDB
•JSON Files

See the Azure Data Factory article for other supported targets.
• Easy to set up and supports multiple sources.
• Makes use of the Azure Cosmos DB bulk executor library.
• Suitable for large datasets.
• Lack of checkpointing - It means that if an issue occurs during the course of migration, you need to restart the whole migration process.
• Lack of a dead letter queue - It means that a few erroneous files can stop the entire migration process.
Offline Azure Cosmos DB Spark connector Azure Cosmos DB SQL API.

You can use other sources with additional connectors from the Spark ecosystem.
Azure Cosmos DB SQL API.

You can use other targets with additional connectors from the Spark ecosystem.
• Makes use of the Azure Cosmos DB bulk executor library.
• Suitable for large datasets.
• Needs a custom Spark setup.
• Spark is sensitive to schema inconsistencies and this can be a problem during migration.
Online Azure Cosmos DB Spark connector + Change Feed Azure Cosmos DB SQL API.

Uses Azure Cosmos DB Change Feed to stream all historic data as well as live updates.
Azure Cosmos DB SQL API.

You can use other targets with additional connectors from the Spark ecosystem.
• Makes use of the Azure Cosmos DB bulk executor library.
• Suitable for large datasets.
• Needs a custom Spark setup.
• Spark is sensitive to schema inconsistencies and this can be a problem during migration.
Offline Custom tool with Cosmos DB bulk executor library The source depends on your custom code Azure Cosmos DB SQL API • Provides checkpointing, dead-lettering capabilities which increases migration resiliency.
• Suitable for very large datasets (10 TB+).
• Requires custom setup of this tool running as an App Service.
Online Cosmos DB Functions + ChangeFeed API Azure Cosmos DB SQL API Azure Cosmos DB SQL API • Easy to set up.
• Works only if the source is an Azure Cosmos DB container.
• Not suitable for large datasets.
• Does not capture deletes from the source container.
Online Custom Migration Service using ChangeFeed Azure Cosmos DB SQL API Azure Cosmos DB SQL API • Provides progress tracking.
• Works only if the source is an Azure Cosmos DB container.
• Works for larger datasets as well.
• Requires the user to set up an App Service to host the Change feed processor.
• Does not capture deletes from the source container.
Online Striim •Oracle
•Apache Cassandra

See the Striim website for other supported sources.
•Azure Cosmos DB SQL API
• Azure Cosmos DB Cassandra API

See the Striim website for other supported targets.
• Works with a large variety of sources like Oracle, DB2, SQL Server.
• Easy to build ETL pipelines and provides a dashboard for monitoring.
• Supports larger datasets.
• Since this is a third-party tool, it needs to be purchased from the marketplace and installed in the user's environment.

Azure Cosmos DB Mongo API

Follow the pre-migration guide to plan your migration.

When you are ready to migrate, you can find detailed guidance on migration tools below

Then, follow our post-migration guide to optimize your Azure Cosmos DB data estate once you have migrated.

A summary of migration pathways from your current solution to Azure Cosmos DB API for MongoDB is provided below:

Migration type Solution Supported sources Supported targets Considerations
Online Azure Database Migration Service MongoDB Azure Cosmos DB API for MongoDB • Makes use of the Azure Cosmos DB bulk executor library.
• Suitable for large datasets and takes care of replicating live changes.
• Works only with other MongoDB sources.
Offline Azure Database Migration Service MongoDB Azure Cosmos DB API for MongoDB • Makes use of the Azure Cosmos DB bulk executor library.
• Suitable for large datasets and takes care of replicating live changes.
• Works only with other MongoDB sources.
Offline Azure Data Factory •JSON/CSV Files
•Azure Cosmos DB SQL API
•Azure Cosmos DB API for MongoDB
•MongoDB
•SQL Server
•Table Storage
•Azure Blob Storage

See the Azure Data Factory article for other supported sources.
•Azure Cosmos DB SQL API
•Azure Cosmos DB API for MongoDB
• JSON files

See the Azure Data Factory article for other supported targets.
• Easy to set up and supports multiple sources.
• Makes use of the Azure Cosmos DB bulk executor library.
• Suitable for large datasets.
• Lack of checkpointing means that any issue during the course of migration would require a restart of the whole migration process.
• Lack of a dead letter queue would mean that a few erroneous files could stop the entire migration process.
• Needs custom code to increase read throughput for certain data sources.
Offline Existing Mongo Tools (mongodump, mongorestore, Studio3T) MongoDB Azure Cosmos DB API for MongoDB • Easy to set up and integration.
• Needs custom handling for throttles.

Azure Cosmos DB Cassandra API

If you need help with capacity planning, consider reading our guide to estimating RU/s using Azure Cosmos DB capacity planner.

Migration type Solution Supported sources Supported targets Considerations
Offline cqlsh COPY command CSV Files Azure Cosmos DB Cassandra API • Easy to set up.
• Not suitable for large datasets.
• Works only when the source is a Cassandra table.
Offline Copy table with Spark •Apache Cassandra
Azure Cosmos DB Cassandra API • Can make use of Spark capabilities to parallelize transformation and ingestion.
• Needs configuration with a custom retry policy to handle throttles.
Online Dual-write proxy + Spark •Apache Cassandra
•Azure Cosmos DB Cassandra API
• Supports larger datasets, but careful attention required for setup and validation.
• Open-source tools, no purchase required.
Online Striim (from Oracle DB/Apache Cassandra) •Oracle
•Apache Cassandra

See the Striim website for other supported sources.
•Azure Cosmos DB SQL API
•Azure Cosmos DB Cassandra API

See the Striim website for other supported targets.
• Works with a large variety of sources like Oracle, DB2, SQL Server.
• Easy to build ETL pipelines and provides a dashboard for monitoring.
• Supports larger datasets.
• Since this is a third-party tool, it needs to be purchased from the marketplace and installed in the user's environment.
Online Arcion (from Oracle DB/Apache Cassandra) •Oracle
•Apache Cassandra

See the Arcion website for other supported sources.
Azure Cosmos DB Cassandra API.

See the Arcion website for other supported targets.
• Supports larger datasets.
• Since this is a third-party tool, it needs to be purchased from the marketplace and installed in the user's environment.

Other APIs

For APIs other than the SQL API, Mongo API and the Cassandra API, there are various tools supported by each of the API's existing ecosystems.

Table API

Gremlin API

Next steps

  • Trying to do capacity planning for a migration to Azure Cosmos DB?
  • Learn more by trying out the sample applications consuming the bulk executor library in .NET and Java.
  • The bulk executor library is integrated into the Cosmos DB Spark connector, to learn more, see Azure Cosmos DB Spark connector article.
  • Contact the Azure Cosmos DB product team by opening a support ticket under the "General Advisory" problem type and "Large (TB+) migrations" problem subtype for additional help with large scale migrations.