title | description | author | ms.author | ms.reviewer | ms.service | ms.topic | ms.date |
---|---|---|---|---|---|---|---|
Cosmos DB Migration options |
This doc describes the various options to migrate your on-premises or cloud data to Azure Cosmos DB |
seesharprun |
sidandrews |
mjbrown |
cosmos-db |
how-to |
04/02/2022 |
[!INCLUDEappliesto-all-apis]
You can load data from various data sources to Azure Cosmos DB. Since Azure Cosmos DB supports multiple APIs, the targets can be any of the existing APIs. The following are some scenarios where you migrate data to Azure Cosmos DB:
- Move data from one Azure Cosmos container to another container in the same database or a different databases.
- Moving data between dedicated containers to shared database containers.
- Move data from an Azure Cosmos account located in region1 to another Azure Cosmos account in the same or a different region.
- Move data from a source such as Azure blob storage, a JSON file, Oracle database, Couchbase, DynamoDB to Azure Cosmos DB.
In order to support migration paths from the various sources to the different Azure Cosmos DB APIs, there are multiple solutions that provide specialized handling for each migration path. This document lists the available solutions and describes their advantages and limitations.
The following factors determine the choice of the migration tool:
-
Online vs offline migration: Many migration tools provide a path to do a one-time migration only. This means that the applications accessing the database might experience a period of downtime. Some migration solutions provide a way to do a live migration where there is a replication pipeline set up between the source and the target.
-
Data source: The existing data can be in various data sources like Oracle DB2, Datastax Cassanda, Azure SQL Database, PostgreSQL, etc. The data can also be in an existing Azure Cosmos DB account and the intent of migration can be to change the data model or repartition the data in a container with a different partition key.
-
Azure Cosmos DB API: For the SQL API in Azure Cosmos DB, there are a variety of tools developed by the Azure Cosmos DB team which aid in the different migration scenarios. All of the other APIs have their own specialized set of tools developed and maintained by the community. Since Azure Cosmos DB supports these APIs at a wire protocol level, these tools should work as-is while migrating data into Azure Cosmos DB too. However, they might require custom handling for throttles as this concept is specific to Azure Cosmos DB.
-
Size of data: Most migration tools work very well for smaller datasets. When the data set exceeds a few hundred gigabytes, the choices of migration tools are limited.
-
Expected migration duration: Migrations can be configured to take place at a slow, incremental pace that consumes less throughput or can consume the entire throughput provisioned on the target Azure Cosmos DB container and complete the migration in less time.
If you need help with capacity planning, consider reading our guide to estimating RU/s using Azure Cosmos DB capacity planner.
- If you are migrating from a vCores- or server-based platform and you need guidance on estimating request units, consider reading our guide to estimating RU/s based on vCores.
Important
The Custom Migration Service using ChangeFeed is an open-source tool for live container migrations that implements change feed and bulk support. However, please note that the user interface application code for this tool is not supported or actively maintained by Microsoft. For Azure Cosmos DB SQL API live container migrations, we recommend using the Spark Connector + Change Feed as illustrated in the sample below. The Spark Connector for Azure Cosmos DB is fully supported by Microsoft.
Migration type | Solution | Supported sources | Supported targets | Considerations |
---|---|---|---|---|
Offline | Data Migration Tool | •JSON/CSV Files •Azure Cosmos DB SQL API •MongoDB •SQL Server •Table Storage •AWS DynamoDB •Azure Blob Storage |
•Azure Cosmos DB SQL API •Azure Cosmos DB Tables API •JSON Files |
• Easy to set up and supports multiple sources. • Not suitable for large datasets. |
Offline | Azure Data Factory | •JSON/CSV Files •Azure Cosmos DB SQL API •Azure Cosmos DB API for MongoDB •MongoDB •SQL Server •Table Storage •Azure Blob Storage See the Azure Data Factory article for other supported sources. |
•Azure Cosmos DB SQL API •Azure Cosmos DB API for MongoDB •JSON Files See the Azure Data Factory article for other supported targets. |
• Easy to set up and supports multiple sources. • Makes use of the Azure Cosmos DB bulk executor library. • Suitable for large datasets. • Lack of checkpointing - It means that if an issue occurs during the course of migration, you need to restart the whole migration process. • Lack of a dead letter queue - It means that a few erroneous files can stop the entire migration process. |
Offline | Azure Cosmos DB Spark connector | Azure Cosmos DB SQL API. You can use other sources with additional connectors from the Spark ecosystem. |
Azure Cosmos DB SQL API. You can use other targets with additional connectors from the Spark ecosystem. |
• Makes use of the Azure Cosmos DB bulk executor library. • Suitable for large datasets. • Needs a custom Spark setup. • Spark is sensitive to schema inconsistencies and this can be a problem during migration. |
Online | Azure Cosmos DB Spark connector + Change Feed | Azure Cosmos DB SQL API. Uses Azure Cosmos DB Change Feed to stream all historic data as well as live updates. |
Azure Cosmos DB SQL API. You can use other targets with additional connectors from the Spark ecosystem. |
• Makes use of the Azure Cosmos DB bulk executor library. • Suitable for large datasets. • Needs a custom Spark setup. • Spark is sensitive to schema inconsistencies and this can be a problem during migration. |
Offline | Custom tool with Cosmos DB bulk executor library | The source depends on your custom code | Azure Cosmos DB SQL API | • Provides checkpointing, dead-lettering capabilities which increases migration resiliency. • Suitable for very large datasets (10 TB+). • Requires custom setup of this tool running as an App Service. |
Online | Cosmos DB Functions + ChangeFeed API | Azure Cosmos DB SQL API | Azure Cosmos DB SQL API | • Easy to set up. • Works only if the source is an Azure Cosmos DB container. • Not suitable for large datasets. • Does not capture deletes from the source container. |
Online | Custom Migration Service using ChangeFeed | Azure Cosmos DB SQL API | Azure Cosmos DB SQL API | • Provides progress tracking. • Works only if the source is an Azure Cosmos DB container. • Works for larger datasets as well. • Requires the user to set up an App Service to host the Change feed processor. • Does not capture deletes from the source container. |
Online | Striim | •Oracle •Apache Cassandra See the Striim website for other supported sources. |
•Azure Cosmos DB SQL API • Azure Cosmos DB Cassandra API See the Striim website for other supported targets. |
• Works with a large variety of sources like Oracle, DB2, SQL Server. • Easy to build ETL pipelines and provides a dashboard for monitoring. • Supports larger datasets. • Since this is a third-party tool, it needs to be purchased from the marketplace and installed in the user's environment. |
Follow the pre-migration guide to plan your migration.
- If you need help with capacity planning, consider reading our guide to estimating RU/s using Azure Cosmos DB capacity planner.
- If you are migrating from a vCores- or server-based platform and you need guidance on estimating request units, consider reading our guide to estimating RU/s based on vCores.
When you are ready to migrate, you can find detailed guidance on migration tools below
- Offline migration using MongoDB native tools
- Offline migration using Azure database migration service (DMS)
- Online migration using Azure database migration service (DMS)
- Offline/online migration using Azure Databricks and Spark
Then, follow our post-migration guide to optimize your Azure Cosmos DB data estate once you have migrated.
A summary of migration pathways from your current solution to Azure Cosmos DB API for MongoDB is provided below:
Migration type | Solution | Supported sources | Supported targets | Considerations |
---|---|---|---|---|
Online | Azure Database Migration Service | MongoDB | Azure Cosmos DB API for MongoDB | • Makes use of the Azure Cosmos DB bulk executor library. • Suitable for large datasets and takes care of replicating live changes. • Works only with other MongoDB sources. |
Offline | Azure Database Migration Service | MongoDB | Azure Cosmos DB API for MongoDB | • Makes use of the Azure Cosmos DB bulk executor library. • Suitable for large datasets and takes care of replicating live changes. • Works only with other MongoDB sources. |
Offline | Azure Data Factory | •JSON/CSV Files •Azure Cosmos DB SQL API •Azure Cosmos DB API for MongoDB •MongoDB •SQL Server •Table Storage •Azure Blob Storage See the Azure Data Factory article for other supported sources. |
•Azure Cosmos DB SQL API •Azure Cosmos DB API for MongoDB • JSON files See the Azure Data Factory article for other supported targets. |
• Easy to set up and supports multiple sources. • Makes use of the Azure Cosmos DB bulk executor library. • Suitable for large datasets. • Lack of checkpointing means that any issue during the course of migration would require a restart of the whole migration process. • Lack of a dead letter queue would mean that a few erroneous files could stop the entire migration process. • Needs custom code to increase read throughput for certain data sources. |
Offline | Existing Mongo Tools (mongodump, mongorestore, Studio3T) | MongoDB | Azure Cosmos DB API for MongoDB | • Easy to set up and integration. • Needs custom handling for throttles. |
If you need help with capacity planning, consider reading our guide to estimating RU/s using Azure Cosmos DB capacity planner.
Migration type | Solution | Supported sources | Supported targets | Considerations |
---|---|---|---|---|
Offline | cqlsh COPY command | CSV Files | Azure Cosmos DB Cassandra API | • Easy to set up. • Not suitable for large datasets. • Works only when the source is a Cassandra table. |
Offline | Copy table with Spark | •Apache Cassandra |
Azure Cosmos DB Cassandra API | • Can make use of Spark capabilities to parallelize transformation and ingestion. • Needs configuration with a custom retry policy to handle throttles. |
Online | Dual-write proxy + Spark | •Apache Cassandra |
•Azure Cosmos DB Cassandra API |
• Supports larger datasets, but careful attention required for setup and validation. • Open-source tools, no purchase required. |
Online | Striim (from Oracle DB/Apache Cassandra) | •Oracle •Apache Cassandra See the Striim website for other supported sources. |
•Azure Cosmos DB SQL API •Azure Cosmos DB Cassandra API See the Striim website for other supported targets. |
• Works with a large variety of sources like Oracle, DB2, SQL Server. • Easy to build ETL pipelines and provides a dashboard for monitoring. • Supports larger datasets. • Since this is a third-party tool, it needs to be purchased from the marketplace and installed in the user's environment. |
Online | Arcion (from Oracle DB/Apache Cassandra) | •Oracle •Apache Cassandra See the Arcion website for other supported sources. |
Azure Cosmos DB Cassandra API. See the Arcion website for other supported targets. |
• Supports larger datasets. • Since this is a third-party tool, it needs to be purchased from the marketplace and installed in the user's environment. |
For APIs other than the SQL API, Mongo API and the Cassandra API, there are various tools supported by each of the API's existing ecosystems.
Table API
Gremlin API
- Trying to do capacity planning for a migration to Azure Cosmos DB?
- If all you know is the number of vcores and servers in your existing database cluster, read about estimating request units using vCores or vCPUs
- If you know typical request rates for your current database workload, read about estimating request units using Azure Cosmos DB capacity planner
- Learn more by trying out the sample applications consuming the bulk executor library in .NET and Java.
- The bulk executor library is integrated into the Cosmos DB Spark connector, to learn more, see Azure Cosmos DB Spark connector article.
- Contact the Azure Cosmos DB product team by opening a support ticket under the "General Advisory" problem type and "Large (TB+) migrations" problem subtype for additional help with large scale migrations.