You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/stream-analytics-documentdb-output.md
+19-1Lines changed: 19 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ To match your application requirements, Azure Cosmos DB allows you to fine tune
33
33
## Upserts from Stream Analytics
34
34
Stream Analytics integration with Azure Cosmos DB allows you to insert or update records in your collection based on a given Document ID column. This is also referred to as an *Upsert*.
35
35
36
-
Stream Analytics uses an optimistic upsert approach, where updates are only done when insert fails with a Document ID conflict. This update is performed as a PATCH, so it enables partial updates to the document, that is, addition of new properties or replacing an existing property is performed incrementally. However, changes in the values of array properties in your JSON document result in the entire array getting overwritten, that is, the array isn't merged.
36
+
Stream Analytics uses an optimistic upsert approach, where updates are only done when insert fails with a Document ID conflict. With Compatibility Level 1.0, this update is performed as a PATCH, so it enables partial updates to the document, that is, addition of new properties or replacing an existing property is performed incrementally. However, changes in the values of array properties in your JSON document result in the entire array getting overwritten, that is, the array isn't merged. With 1.2, upsert behavior is modified to insert or replace the document. This is described further in the Compatibility Level 1.2 section below.
37
37
38
38
If the incoming JSON document has an existing ID field, that field is automatically used as the Document ID column in Cosmos DB and any subsequent writes are handled as such, leading to one of these situations:
39
39
- unique IDs lead to insert
@@ -51,6 +51,24 @@ For fixed Azure Cosmos DB collections, Stream Analytics allows no way to scale u
51
51
52
52
Writing to multiple fixed containers is being deprecated and is not the recommended approach for scaling out your Stream Analytics job. The article [Partitioning and scaling in Cosmos DB](../cosmos-db/sql-api-partition-data.md) provides further details.
53
53
54
+
## Compatibility Level 1.2
55
+
With Compatibility level 1.2, Stream Analytics supports native integration to bulk write into Cosmos DB. This enables writing effectively to Cosmos DB with maximizing throughput and efficiently handle throttling requests. The improved writing mechanism is available under a new compatibility level due to a upsert behavior difference. Prior to 1.2, the upsert behavior is to insert or merge the document. With 1.2, upserts behavior is modified to insert or replace the document.
56
+
57
+
Before 1.2, uses a custom stored procedure to bulk upsert documents per partition key into Cosmos DB, where a batch is written as a transaction. Even when a single record hits an error, the whole batch must be retried. This made scenarios even with reasonable throttling relatively slower. Following comparision shows how such jobs would behave with 1.2.
58
+
59
+
Below setup shows two identical Stream Analytics jobs reading from same input (event hub). Both Stream Analytics jobs are [fully partitioned](https://docs.microsoft.com/azure/stream-analytics/stream-analytics-parallelization#embarrassingly-parallel-jobs) with a passthrough query and writing to identical CosmosDB collections. Metrics on the left is from the job configured with compatibility level 1.0 and the ones on the right is configured with 1.2. Cosmos DB collections partition key is a unique guid coming from the input event.
Incoming event rate in Event Hub is 2x higher than Cosmos DB collections (20K RUs) are configured to intake, so throttling is expected in Cosmos DB. However, the job with 1.2, is consistently writing at a higher throughput (Output Events/minute) and with a lower average SU% utilization. In your environment, this difference will depend on few more factors such as choice of event format, input event/message size, partition keys, query etc.
64
+
65
+

66
+
67
+
With 1.2, Stream Analytics is more intelligent in utilizing 100% of the available throughput in Cosmos DB with very few resubmissions from throttling/rate limiting. This provides a better experience for other workloads like queries running on the collection at the same time. In case you need to try out how ASA scales out with Cosmos DB as a sink for 1k to 10k messages/second, here is an [azure samples project](https://github.com/Azure-Samples/streaming-at-scale/tree/master/eventhubs-streamanalytics-cosmosdb) that lets you do that.
68
+
Since 1.2 is currently not the default, you can [set compatibility level](https://docs.microsoft.com/azure/stream-analytics/stream-analytics-compatibility-level) for a Stream Analytics job by using portal or by using the [create job REST API call](https://docs.microsoft.com/rest/api/streamanalytics/stream-analytics-job). It’s strongly recommended to use Compatibility Level 1.2 in ASA with Cosmos DB.
69
+
70
+
71
+
54
72
## Cosmos DB settings for JSON output
55
73
56
74
Creating Cosmos DB as an output in Stream Analytics generates a prompt for information as seen below. This section provides an explanation of the properties definition.
0 commit comments