Skip to content

Files

Latest commit

2555771 · Apr 8, 2022

History

History
345 lines (212 loc) · 22.2 KB

configure-synapse-link.md

File metadata and controls

345 lines (212 loc) · 22.2 KB
title description author ms.service ms.topic ms.date ms.author ms.custom
Configure and use Azure Synapse Link for Azure Cosmos DB
Learn how to enable Synapse link for Azure Cosmos DB accounts, create a container with analytical store enabled, connect the Azure Cosmos database to Synapse workspace, and run queries.
Rodrigossz
cosmos-db
how-to
11/02/2021
rosouz
references_regions, synapse-cosmos-db, devx-track-azurepowershell

Configure and use Azure Synapse Link for Azure Cosmos DB

[!INCLUDEappliesto-sql-mongodb-api]

Azure Synapse Link for Azure Cosmos DB is a cloud-native hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data in Azure Cosmos DB. Synapse Link creates a tight seamless integration between Azure Cosmos DB and Azure Synapse Analytics.

Azure Synapse Link is available for Azure Cosmos DB SQL API or for Azure Cosmos DB API for Mongo DB accounts. Use the following steps to run analytical queries with the Azure Synapse Link for Azure Cosmos DB:

You can also checkout the learn module on how to configure Azure Synapse Link for Azure Cosmos DB.

Enable Azure Synapse Link for Azure Cosmos DB accounts

Note

If you want to use customer-managed keys with Azure Synapse Link, you must configure your account's managed identity in your Azure Key Vault access policy before enabling Synapse Link on your account. To learn more, see how to Configure customer-managed keys using Azure Cosmos DB accounts' managed identities article.

Note

If you want to use Full Fidelity Schema for SQL (CORE) API accounts, you can't use the Azure portal to enable Synapse Link. This option can't be changed after Synapse Link is enabled in your account and to set it you must use Azure CLI or PowerShell. For more information, check analytical store schema representation documentation.

Azure portal

  1. Sign into the Azure portal.

  2. Create a new Azure account, or select an existing Azure Cosmos DB account.

  3. Navigate to your Azure Cosmos DB account and open the Features pane.

  4. Select Synapse Link from the features list.

    :::image type="content" source="./media/configure-synapse-link/find-synapse-link-feature.png" alt-text="Find Synapse Link feature":::

  5. Next it prompts you to enable synapse link on your account. Select Enable. This process can take 1 to 5 minutes to complete.

    :::image type="content" source="./media/configure-synapse-link/enable-synapse-link-feature.png" alt-text="Enable Synapse Link feature":::

  6. Your account is now enabled to use Synapse Link. Next see how to create analytical store enabled containers to automatically start replicating your operational data from the transactional store to the analytical store.

Note

Turning on Synapse Link does not turn on the analytical store automatically. Once you enable Synapse Link on the Cosmos DB account, enable analytical store on containers to start using Synapse Link.

Command-Line Tools

Enable Synapse Link in your Cosmos DB SQL API or MongoDB API account using Azure CLI or PowerShell.

Azure CLI

Use --enable-analytical-storage true for both create or update operations. You also need to choose the representation schema type. For SQL API accounts you can use --analytical-storage-schema-type with the values FullFidelity or WellDefined. For MongoDB API accounts, always use --analytical-storage-schema-type FullFidelity.

PowerShell

Use EnableAnalyticalStorage true for both create or update operations. You also need to choose the representation schema type. For SQL API accounts you can use --analytical-storage-schema-type with the values FullFidelity or WellDefined. For MongoDB API accounts, always use -AnalyticalStorageSchemaType FullFidelity.

Create an analytical store enabled container

You can turn on analytical store when creating an Azure Cosmos DB container by using one of the following options.

Azure portal

  1. Sign in to the Azure portal or the Azure Cosmos DB Explorer.

  2. Navigate to your Azure Cosmos DB account and open the Data Explorer tab.

  3. Select New Container and enter a name for your database, container, partition key and throughput details. Turn on the Analytical store option. After you enable the analytical store, it creates a container with analytical TTL property set to the default value of -1 (infinite retention). This analytical store that retains all the historical versions of records and can be changed later.

    :::image type="content" source="./media/configure-synapse-link/create-container-analytical-store.png" alt-text="Turn on analytical store for Azure Cosmos DB container":::

  4. If you have previously not enabled Synapse Link on this account, it will prompt you to do so because it's a pre-requisite to create an analytical store enabled container. If prompted, select Enable Synapse Link. This process can take 1 to 5 minutes to complete.

  5. Select OK, to create an analytical store enabled Azure Cosmos DB container.

  6. After the container is created, verify that analytical store has been enabled by clicking Settings, right below Documents in Data Explorer, and check if the Analytical Store Time to Live option is turned on.

Azure Cosmos DB SDKs

Set the analytical TTL property to the required value to create an analytical store enabled container. For the list of allowed values, see the analytical TTL supported values article.

.NET SDK

The following code creates a container with analytical store by using the .NET SDK. Set the AnalyticalStoreTimeToLiveInSeconds property to the required value in seconds or use -1 for infinite retention. This setting can be changed later.

// Create a container with a partition key, and analytical TTL configured to -1 (infinite retention)
ContainerProperties properties = new ContainerProperties()
{
    Id = "myContainerId",
    PartitionKeyPath = "/id",
    AnalyticalStoreTimeToLiveInSeconds = -1,
};
CosmosClient cosmosClient = new CosmosClient("myConnectionString");
await cosmosClient.GetDatabase("myDatabase").CreateContainerAsync(properties);

Java V4 SDK

The following code creates a container with analytical store by using the Java V4 SDK. Set the AnalyticalStoreTimeToLiveInSeconds property to the required value in seconds or use -1 for infinite retention. This setting can be changed later.

// Create a container with a partition key and  analytical TTL configured to  -1 (infinite retention) 
CosmosContainerProperties containerProperties = new CosmosContainerProperties("myContainer", "/myPartitionKey");

containerProperties.setAnalyticalStoreTimeToLiveInSeconds(-1);

container = database.createContainerIfNotExists(containerProperties, 400).block().getContainer();

Python V4 SDK

The following code creates a container with analytical store by using the Python V4 SDK. Set the analytical_storage_ttl property to the required value in seconds or use -1 for infinite retention. This setting can be changed later.

# Azure Cosmos DB Python SDK, for SQL API only.
# Creating an analytical store enabled container.

import azure.cosmos as cosmos
import azure.cosmos.cosmos_client as cosmos_client
import azure.cosmos.exceptions as exceptions
from azure.cosmos.partition_key import PartitionKey

HOST = 'your-cosmos-db-account-URI'
KEY = 'your-cosmos-db-account-key'
DATABASE = 'your-cosmos-db-database-name'
CONTAINER = 'your-cosmos-db-container-name'

# Client
client = cosmos_client.CosmosClient(HOST,  KEY )

# Database client
try:
    db = client.create_database(DATABASE)

except exceptions.CosmosResourceExistsError:
    db = client.get_database_client(DATABASE)

# Creating the container with analytical store enabled
try:
    container = db.create_container(
        id=CONTAINER,
        partition_key=PartitionKey(path='/id', kind='Hash'),analytical_storage_ttl=-1
    )
    properties = container.read()
    print('Container with id \'{0}\' created'.format(container.id))
    print('Partition Key - \'{0}\''.format(properties['partitionKey']))

except exceptions.CosmosResourceExistsError:
    print('A container with already exists')

Command-Line Tools

Set the analytical TTL property to the required value to create an analytical store enabled container. For the list of allowed values, see the analytical TTL supported values article.

Azure CLI

The following options create a container with analytical store by using Azure CLI. Set the --analytical-storage-ttl property to the required value in seconds or use -1 for infinite retention. This setting can be changed later.

PowerShell

The following options create a container with analytical store by using PowerShell. Set the -AnalyticalStorageTtl property to the required value in seconds or use -1 for infinite retention. This setting can be changed later.

Enable analytical store in an existing container

Note

You can turn on analytical store on existing Azure Cosmos DB SQL API containers. This capability is general available and can be used for production workloads.

Please note the following details when enabling Azure Synapse Link on your existing containers:

  • The same performance isolation of the analytical store auto-sync process applies to the initial sync and there is no performance impact on your OLTP workload.

  • A container's initial sync with analytical store total time will vary depending on the data volume and on the documents complexity. This process can take anywhere from a few seconds to multiple days. Please use the Azure portal to monitor the migration progress.

  • The throughput of your container, or database account, also influences the total initial sync time. Although RU/s are not used in this migration, the total RU/s available influences the performance of the process. You can temporarily increase your environment's available RUs to speed up the process.

  • You won't be able to query analytical store of an existing container while Synapse Link is being enabled on that container. Your OLTP workload isn't impacted and you can keep on reading data normally. Data ingested after the start of the initial sync will be merged into analytical store by the regular analytical store auto-sync process.

  • Currently existing MongoDB API collections are not supported. The alternative is to migrate the data into a new collection, created with analytical store turned on.

Azure portal

  1. Sign in to the Azure portal.
  2. Navigate to your Azure Cosmos DB account and open the Azure Synapse Link" tab in the Integrations left navigation section. In this tab you can enable Synapse Link in your database account and you can enable Synapse Link on your existing containers.
  3. After you click the blue Enable Synapse Link on your container(s) button, you will start to see the progress of your containers initial sync progress.
  4. Optionally, you can go to the Power BI tab, also in the Integrations section, to create Power BI dashboards on your Synapse Link enabled containers.

Command-Line Tools

Set the analytical TTL property to -1 for infinite retention or use a positive integer to specify the number of seconds that the data will be retain in analytical store. For more information, see the analytical TTL supported values article.

Azure CLI

PowerShell

  • Use Update Analytical ttl to update -AnalyticalStorageTtl.
  • Check the migration status in the Azure portal.

Optional - Update the analytical store time to live

After the analytical store is enabled with a particular TTL value, you may want to update it to a different valid value. You can update the value by using the Azure portal, Azure CLI, PowerShell, or Cosmos DB SDKs. For information on the various Analytical TTL config options, see the analytical TTL supported values article.

Azure portal

If you created an analytical store enabled container through the Azure portal, it contains a default analytical TTL of -1. Use the following steps to update this value:

  1. Sign in to the Azure portal or the Azure Cosmos DB Explorer.
  2. Navigate to your Azure Cosmos DB account and open the Data Explorer tab.
  3. Select an existing container that has analytical store enabled. Expand it and modify the following values:
    1. Open the Scale & Settings window.
    2. Under Setting find, Analytical Storage Time to Live.
    3. Select On (no default) or select On and set a TTL value.
    4. Click Save to save the changes.

.NET SDK

The following code shows how to update the TTL for analytical store by using the .NET SDK:

// Get the container, update AnalyticalStorageTimeToLiveInSeconds 
ContainerResponse containerResponse = await client.GetContainer("database", "container").ReadContainerAsync();
// Update analytical store TTL
containerResponse.Resource. AnalyticalStorageTimeToLiveInSeconds = 60 * 60 * 24 * 180  // Expire analytical store data in 6 months;
await client.GetContainer("database", "container").ReplaceContainerAsync(containerResponse.Resource);

Java V4 SDK

The following code shows how to update the TTL for analytical store by using the Java V4 SDK:

CosmosContainerProperties containerProperties = new CosmosContainerProperties("myContainer", "/myPartitionKey");

// Update analytical store TTL to expire analytical store data in 6 months;
containerProperties.setAnalyticalStoreTimeToLiveInSeconds (60 * 60 * 24 * 180 );  
 
// Update container settings
container.replace(containerProperties).block();

Python V4 SDK

Currently not supported.

Azure CLI

The following links show how to update containers analytical TTL by using Azure CLI:

PowerShell

The following links show how to update containers analytical TTL by using PowerShell:

Optional - Disable analytical store in a container

Analytical store can be disabled in SQL API containers using Update-AzCosmosDBSqlContainer PowerShell command, by updating -AnalyticalStorageTtl (analytical Time-To-Live) to 0. Please note that currently this action can't be undone. If analytical store is disabled in a container, it can never be re-enabled.

Currently you can't be disabled in MongoDB API collections.

Connect to a Synapse workspace

Use the instructions in Connect to Azure Synapse Link on how to access an Azure Cosmos DB database from Azure Synapse Analytics Studio with Azure Synapse Link.

Query analytical store using Apache Spark for Azure Synapse Analytics

Use the instructions in the Query Azure Cosmos DB analytical store using Spark 3 article on how to query with Synapse Spark 3. That article gives some examples on how you can interact with the analytical store from Synapse gestures. Those gestures are visible when you right-click on a container. With gestures, you can quickly generate code and tweak it to your needs. They are also perfect for discovering data with a single click.

For Spark 2 integration use the instruction in the Query Azure Cosmos DB analytical store using Spark 2 article.

Query the analytical store using serverless SQL pool in Azure Synapse Analytics

Serverless SQL pool allows you to query and analyze data in your Azure Cosmos DB containers that are enabled with Azure Synapse Link. You can analyze data in near real-time without impacting the performance of your transactional workloads. It offers a familiar T-SQL syntax to query data from the analytical store and integrated connectivity to a wide range of BI and ad-hoc querying tools via the T-SQL interface. To learn more, see the Query analytical store using serverless SQL pool article.

Use serverless SQL pool to analyze and visualize data in Power BI

You can use integrated BI experience in Azure Cosmos DB portal, to build BI dashboards using Synapse Link with just a few clicks. To learn more, see how to build BI dashboards using Synapse Link. This integrated experience will create simple T-SQL views in Synapse serverless SQL pools, for your Cosmos DB containers. You can build BI dashboards over these views, which will query your Azure Cosmos DB containers in real-time, using Direct Query, reflecting latest changes to your data. There is no performance or cost impact to your transactional workloads, and no complexity of managing ETL pipelines.

If you want to use advance T-SQL views with joins across your containers or build BI dashboards in import](/power-bi/connect-data/service-dataset-modes-understand#import-mode) mode, see how to use Serverless SQL pool to analyze Azure Cosmos DB data with Synapse Link article.

Configure custom partitioning

Custom partitioning enables you to partition analytical store data on fields that are commonly used as filters in analytical queries resulting in improved query performance. To learn more, see the introduction to custom partitioning and how to configure custom partitioning articles.

Azure Resource Manager template

The Azure Resource Manager template creates a Synapse Link enabled Azure Cosmos DB account for SQL API. This template creates a Core (SQL) API account in one region with a container configured with analytical TTL enabled, and an option to use manual or autoscale throughput. To deploy this template, click on Deploy to Azure on the readme page.

Getting started with Azure Synapse Link - Samples

You can find samples to get started with Azure Synapse Link on GitHub. These showcase end-to-end solutions with IoT and retail scenarios. You can also find the samples corresponding to Azure Cosmos DB API for MongoDB in the same repo under the MongoDB folder.

Next steps

To learn more, see the following docs: