Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 95f289d

Browse files
committedApr 8, 2019
CosmosDB Compat Level 1.2
Add behavior and performance difference in CosmosDB documentation.
1 parent 9dcfa69 commit 95f289d

File tree

3 files changed

+19
-1
lines changed

3 files changed

+19
-1
lines changed
 

‎articles/stream-analytics/stream-analytics-documentdb-output.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ To match your application requirements, Azure Cosmos DB allows you to fine tune
3333
## Upserts from Stream Analytics
3434
Stream Analytics integration with Azure Cosmos DB allows you to insert or update records in your collection based on a given Document ID column. This is also referred to as an *Upsert*.
3535

36-
Stream Analytics uses an optimistic upsert approach, where updates are only done when insert fails with a Document ID conflict. This update is performed as a PATCH, so it enables partial updates to the document, that is, addition of new properties or replacing an existing property is performed incrementally. However, changes in the values of array properties in your JSON document result in the entire array getting overwritten, that is, the array isn't merged.
36+
Stream Analytics uses an optimistic upsert approach, where updates are only done when insert fails with a Document ID conflict. With Compatibility Level 1.0, this update is performed as a PATCH, so it enables partial updates to the document, that is, addition of new properties or replacing an existing property is performed incrementally. However, changes in the values of array properties in your JSON document result in the entire array getting overwritten, that is, the array isn't merged. With 1.2, upsert behavior is modified to insert or replace the document. This is described further in the Compatibility Level 1.2 section below.
3737

3838
If the incoming JSON document has an existing ID field, that field is automatically used as the Document ID column in Cosmos DB and any subsequent writes are handled as such, leading to one of these situations:
3939
- unique IDs lead to insert
@@ -51,6 +51,24 @@ For fixed Azure Cosmos DB collections, Stream Analytics allows no way to scale u
5151

5252
Writing to multiple fixed containers is being deprecated and is not the recommended approach for scaling out your Stream Analytics job. The article [Partitioning and scaling in Cosmos DB](../cosmos-db/sql-api-partition-data.md) provides further details.
5353

54+
## Compatibility Level 1.2
55+
With Compatibility level 1.2, Stream Analytics supports native integration to bulk write into Cosmos DB. This enables writing effectively to Cosmos DB with maximizing throughput and efficiently handle throttling requests. The improved writing mechanism is available under a new compatibility level due to a upsert behavior difference. Prior to 1.2, the upsert behavior is to insert or merge the document. With 1.2, upserts behavior is modified to insert or replace the document.
56+
57+
Before 1.2, uses a custom stored procedure to bulk upsert documents per partition key into Cosmos DB, where a batch is written as a transaction. Even when a single record hits an error, the whole batch must be retried. This made scenarios even with reasonable throttling relatively slower. Following comparision shows how such jobs would behave with 1.2.
58+
59+
Below setup shows two identical Stream Analytics jobs reading from same input (event hub). Both Stream Analytics jobs are [fully partitioned](https://docs.microsoft.com/azure/stream-analytics/stream-analytics-parallelization#embarrassingly-parallel-jobs) with a passthrough query and writing to identical CosmosDB collections. Metrics on the left is from the job configured with compatibility level 1.0 and the ones on the right is configured with 1.2. Cosmos DB collections partition key is a unique guid coming from the input event.
60+
61+
![stream analytics metrics comparision](media/stream-analytics-documentdb-output/stream-analytics-documentdb-output-3.png)
62+
63+
Incoming event rate in Event Hub is 2x higher than Cosmos DB collections (20K RUs) are configured to intake, so throttling is expected in Cosmos DB. However, the job with 1.2, is consistently writing at a higher throughput (Output Events/minute) and with a lower average SU% utilization. In your environment, this difference will depend on few more factors such as choice of event format, input event/message size, partition keys, query etc.
64+
65+
![cosmos db metrics comparision](media/stream-analytics-documentdb-output/stream-analytics-documentdb-output-2.png)
66+
67+
With 1.2, Stream Analytics is more intelligent in utilizing 100% of the available throughput in Cosmos DB with very few resubmissions from throttling/rate limiting. This provides a better experience for other workloads like queries running on the collection at the same time. In case you need to try out how ASA scales out with Cosmos DB as a sink for 1k to 10k messages/second, here is an [azure samples project](https://github.com/Azure-Samples/streaming-at-scale/tree/master/eventhubs-streamanalytics-cosmosdb) that lets you do that.
68+
Since 1.2 is currently not the default, you can [set compatibility level](https://docs.microsoft.com/azure/stream-analytics/stream-analytics-compatibility-level) for a Stream Analytics job by using portal or by using the [create job REST API call](https://docs.microsoft.com/rest/api/streamanalytics/stream-analytics-job). It’s strongly recommended to use Compatibility Level 1.2 in ASA with Cosmos DB.
69+
70+
71+
5472
## Cosmos DB settings for JSON output
5573

5674
Creating Cosmos DB as an output in Stream Analytics generates a prompt for information as seen below. This section provides an explanation of the properties definition.

0 commit comments

Comments
 (0)
Please sign in to comment.