title | titleSuffix | description | author | services | ms.service | ms.topic | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|
Troubleshoot issues with diagnostics queries |
Azure Cosmos DB |
Learn how to query diagnostics logs for troubleshooting data stored in Azure Cosmos DB |
StefArroyo |
cosmos-db |
cosmos-db |
how-to |
05/12/2021 |
esarroyo |
[!INCLUDEappliesto-all-apis]
In this article, we'll cover how to write simple queries to help troubleshoot issues with your Azure Cosmos DB account using diagnostics logs sent to AzureDiagnostics (legacy) and Resource-specific (preview) tables.
For Azure Diagnostics tables, all data is written into one single table and users will need to specify which category they'd like to query.
For resource-specific tables, data is written into individual tables for each category of the resource (not available for table API). We recommend this mode since it makes it much easier to work with the data, provides better discoverability of the schemas, and improves performance across both ingestion latency and query times.
-
How to query for the operations that are taking longer than 3 milliseconds to run:
AzureDiagnostics | where toint(duration_s) > 3 and ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" | summarize count() by clientIpAddress_s, TimeGenerated
-
How to query for the user agent that is running the operations:
AzureDiagnostics | where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" | summarize count() by OperationName, userAgent_s
-
How to query for the long-running operations:
AzureDiagnostics | where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" | project TimeGenerated , duration_s | summarize count() by bin(TimeGenerated, 5s) | render timechart
-
How to get partition key statistics to evaluate skew across top 3 partitions for a database account:
AzureDiagnostics | where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="PartitionKeyStatistics" | project SubscriptionId, regionName_s, databaseName_s, collectionName_s, partitionKey_s, sizeKb_d, ResourceId
-
How to get the request charges for expensive queries?
AzureDiagnostics | where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" and todouble(requestCharge_s) > 10.0 | project activityId_g, requestCharge_s | join kind= inner ( AzureDiagnostics | where ResourceProvider =="MICROSOFT.DOCUMENTDB" and Category == "QueryRuntimeStatistics" | project activityId_g, querytext_s ) on $left.activityId_g == $right.activityId_g | order by requestCharge_s desc | limit 100
-
How to find which operations are taking most of RU/s?
AzureDiagnostics | where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" | where TimeGenerated >= ago(2h) | summarize max(responseLength_s), max(requestLength_s), max(requestCharge_s), count = count() by OperationName, requestResourceType_s, userAgent_s, collectionRid_s, bin(TimeGenerated, 1h)
-
How to get all queries that are consuming more than 100 RU/s joined with data from DataPlaneRequests and QueryRunTimeStatistics.
AzureDiagnostics | where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" and todouble(requestCharge_s) > 100.0 | project activityId_g, requestCharge_s | join kind= inner ( AzureDiagnostics | where ResourceProvider =="MICROSOFT.DOCUMENTDB" and Category == "QueryRuntimeStatistics" | project activityId_g, querytext_s ) on $left.activityId_g == $right.activityId_g | order by requestCharge_s desc | limit 100
-
How to get the request charges and the execution duration of a query?
AzureDiagnostics | where TimeGenerated >= ago(24hr) | where Category == "QueryRuntimeStatistics" | join ( AzureDiagnostics | where TimeGenerated >= ago(24hr) | where Category == "DataPlaneRequests" ) on $left.activityId_g == $right.activityId_g | project databasename_s, collectionname_s, OperationName1 , querytext_s,requestCharge_s1, duration_s1, bin(TimeGenerated, 1min)
-
How to get the distribution for different operations?
AzureDiagnostics | where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" | where TimeGenerated >= ago(2h) | summarize count = count() by OperationName, requestResourceType_s, bin(TimeGenerated, 1h)
-
What is the maximum throughput that a partition has consumed?
AzureDiagnostics | where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" | where TimeGenerated >= ago(2h) | summarize max(requestCharge_s) by bin(TimeGenerated, 1h), partitionId_g
-
How to get the information about the partition keys RU/s consumption per second?
AzureDiagnostics | where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "PartitionKeyRUConsumption" | summarize total = sum(todouble(requestCharge_s)) by databaseName_s, collectionName_s, partitionKey_s, TimeGenerated | order by TimeGenerated asc
-
How to get the request charge for a specific partition key
AzureDiagnostics | where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "PartitionKeyRUConsumption" | where parse_json(partitionKey_s)[0] == "2"
-
How to get the top partition keys with most RU/s consumed in a specific period?
AzureDiagnostics | where ResourceProvider == "MICROSOFT.DOCUMENTDB" and Category == "PartitionKeyRUConsumption" | where TimeGenerated >= datetime("11/26/2019, 11:20:00.000 PM") and TimeGenerated <= datetime("11/26/2019, 11:30:00.000 PM") | summarize total = sum(todouble(requestCharge_s)) by databaseName_s, collectionName_s, partitionKey_s | order by total desc
-
How to get the logs for the partition keys whose storage size is greater than 8 GB?
AzureDiagnostics | where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="PartitionKeyStatistics" | where todouble(sizeKb_d) > 800000
-
How to get P99 or P50 replication latencies for operations, request charge or the length of the response?
AzureDiagnostics | where ResourceProvider=="MICROSOFT.DOCUMENTDB" and Category=="DataPlaneRequests" | where TimeGenerated >= ago(2d) | summarize percentile(todouble(responseLength_s), 50), percentile(todouble(responseLength_s), 99), max(responseLength_s), percentile(todouble(requestCharge_s), 50), percentile(todouble(requestCharge_s), 99), max(requestCharge_s), percentile(todouble(duration_s), 50), percentile(todouble(duration_s), 99), max(duration_s), count() by OperationName, requestResourceType_s, userAgent_s, collectionRid_s, bin(TimeGenerated, 1h)
-
How to get ControlPlane logs?
Remember to switch on the flag as described in the Disable key-based metadata write access article, and execute the operations by using Azure PowerShell, the Azure CLI, or Azure Resource Manager.
AzureDiagnostics | where Category =="ControlPlaneRequests" | summarize by OperationName
-
How to query for the operations that are taking longer than 3 milliseconds to run:
CDBDataPlaneRequests | where toint(DurationMs) > 3 | summarize count() by ClientIpAddress, TimeGenerated
-
How to query for the user agent that is running the operations:
CDBDataPlaneRequests | summarize count() by OperationName, UserAgent
-
How to query for the long-running operations:
CDBDataPlaneRequests | project TimeGenerated , DurationMs | summarize count() by bin(TimeGenerated, 5s) | render timechart
-
How to get partition key statistics to evaluate skew across top 3 partitions for a database account:
CDBPartitionKeyStatistics | project RegionName, DatabaseName, CollectionName, PartitionKey, SizeKb
-
How to get the request charges for expensive queries?
CDBDataPlaneRequests | where todouble(RequestCharge) > 10.0 | project ActivityId, RequestCharge | join kind= inner ( CDBQueryRuntimeStatistics | project ActivityId, QueryText ) on $left.ActivityId == $right.ActivityId | order by RequestCharge desc | limit 100
-
How to find which operations are taking most of RU/s?
CDBDataPlaneRequests | where TimeGenerated >= ago(2h) | summarize max(ResponseLength), max(RequestLength), max(RequestCharge), count = count() by OperationName, RequestResourceType, UserAgent, CollectionName, bin(TimeGenerated, 1h)
-
How to get all queries that are consuming more than 100 RU/s joined with data from DataPlaneRequests and QueryRunTimeStatistics.
CDBDataPlaneRequests | where todouble(RequestCharge) > 100.0 | project ActivityId, RequestCharge | join kind= inner ( CDBQueryRuntimeStatistics | project ActivityId, QueryText ) on $left.ActivityId == $right.ActivityId | order by RequestCharge desc | limit 100
-
How to get the request charges and the execution duration of a query?
CDBQueryRuntimeStatistics | join kind= inner ( CDBDataPlaneRequests ) on $left.ActivityId == $right.ActivityId | project DatabaseName, CollectionName, OperationName , QueryText, RequestCharge, DurationMs, bin(TimeGenerated, 1min)
-
How to get the distribution for different operations?
CDBDataPlaneRequests | where TimeGenerated >= ago(2h) | summarize count = count() by OperationName, RequestResourceType, bin(TimeGenerated, 1h)
-
What is the maximum throughput that a partition has consumed?
CDBDataPlaneRequests | where TimeGenerated >= ago(2h) | summarize max(RequestCharge) by bin(TimeGenerated, 1h), PartitionId
-
How to get the information about the partition keys RU/s consumption per second?
CDBPartitionKeyRUConsumption | summarize total = sum(todouble(RequestCharge)) by DatabaseName, CollectionName, PartitionKey, TimeGenerated | order by TimeGenerated asc
-
How to get the request charge for a specific partition key?
CDBPartitionKeyRUConsumption | where parse_json(PartitionKey)[0] == "2"
-
How to get the top partition keys with most RU/s consumed in a specific period?
CDBPartitionKeyRUConsumption | where TimeGenerated >= datetime("02/12/2021, 11:20:00.000 PM") and TimeGenerated <= datetime("05/12/2021, 11:30:00.000 PM") | summarize total = sum(todouble(RequestCharge)) by DatabaseName, CollectionName, PartitionKey | order by total desc
-
How to get the logs for the partition keys whose storage size is greater than 8 GB?
CDBPartitionKeyStatistics | where todouble(SizeKb) > 800000
-
How to get P99 or P50 replication latencies for operations, request charge or the length of the response?
CDBDataPlaneRequests | where TimeGenerated >= ago(2d) | summarize percentile(todouble(ResponseLength), 50), percentile(todouble(ResponseLength), 99), max(ResponseLength), percentile(todouble(RequestCharge), 50), percentile(todouble(RequestCharge), 99), max(RequestCharge), percentile(todouble(DurationMs), 50), percentile(todouble(DurationMs), 99), max(DurationMs),count() by OperationName, RequestResourceType, UserAgent, CollectionName, bin(TimeGenerated, 1h)
-
How to get ControlPlane logs?
Remember to switch on the flag as described in the Disable key-based metadata write access article, and execute the operations by using Azure PowerShell, the Azure CLI, or Azure Resource Manager.
CDBControlPlaneRequests | summarize by OperationName
-
For more information on how to create diagnostic settings for Cosmos DB see Creating Diagnostics settings article.
-
For detailed information about how to create a diagnostic setting by using the Azure portal, CLI, or PowerShell, see create diagnostic setting to collect platform logs and metrics in Azure article.