Skip to content

Commit 47c032b

Browse files
committedFeb 3, 2019
custom output partition
1 parent 17d934e commit 47c032b

File tree

5 files changed

+56
-8
lines changed

5 files changed

+56
-8
lines changed
 

‎articles/stream-analytics/stream-analytics-custom-path-patterns-blob-storage-output.md

Lines changed: 56 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,71 @@
11
---
2-
title: DateTime path patterns for Azure Stream Analytics blob output (Preview)
3-
description: This article describes the custom DateTime path patterns feature for blob storage output from Azure Stream Analytics jobs.
2+
title: Azure Stream Analytics custom blob output partitioning (Preview)
3+
description: This article describes the custom DateTime path patterns and the custom field or attributes features for blob storage output from Azure Stream Analytics jobs.
44
services: stream-analytics
55
author: mamccrea
66
ms.author: mamccrea
77
ms.reviewer: mamccrea
88
ms.service: stream-analytics
99
ms.topic: conceptual
10-
ms.date: 12/06/2018
10+
ms.date: 02/05/2019
1111
ms.custom: seodec18
1212
---
1313

14-
# Custom DateTime path patterns for Azure Stream Analytics blob storage output (Preview)
14+
# Azure Stream Analytics custom blob output partitioning (Preview)
1515

16-
Azure Stream Analytics supports custom date and time format specifiers in the file path for blob storage outputs. Custom DateTime path patterns allow you to specify an output format that aligns with Hive Streaming conventions, giving Azure Stream Analytics the ability to send data to Azure HDInsight and Azure Databricks for downstream processing. Custom DateTime path patterns are easily implemented using the `datetime` keyword in the Path Prefix field of your blob output, along with the format specifier. For example, `{datetime:yyyy}`.
16+
Azure Stream Analytics supports custom blob output partitioning with custom fields or attributes and custom DateTime path patterns.
17+
18+
## Custom field or attributes
19+
20+
Custom field or input attributes improve downstream data-processing and reporting workflows by allowing more control over the output.
21+
22+
### Partition key options
23+
24+
The partition key, or column name, used to partition input data may contain alphanumeric characters with hyphens, underscores, and spaces. It is not possible to use nested fields as a partition key unless used in conjunction with aliases.
25+
26+
### Example
27+
28+
Suppose a job takes input data from live user sessions connected to an external video game service where ingested data contains a column **client_id** to identify the sessions. To partition the data by **client_id**, set the Blob Path Pattern field to include a partition token **{client_id}** in blob output properties when creating a job. As data with various **client_id** values flow through the Stream Analytics job, the output data is saved into separate folders based on a single **client_id** value per folder.
29+
30+
![Path pattern with client id](./media/stream-analytics-custom-path-patterns-blob-storage-output/stream-analytics-path-pattern-client-id.png)
31+
32+
Similarly, if the job input was sensor data from millions of sensors where each sensor had a **sensor_id**, the Path Pattern would be **{sensor_id}** to partition each sensor data to different folders.
33+
34+
35+
Using the REST API, the output section of a JSON file used for that request may look like the following:
36+
37+
![REST API output](./media/stream-analytics-custom-path-patterns-blob-storage-output/stream-analytics-rest-output.png)
38+
39+
Once the job starts running, the *clients* container may look like the following:
40+
41+
![Clients container](./media/stream-analytics-custom-path-patterns-blob-storage-output/stream-analytics-clients-container.png)
42+
43+
Each folder may contain multiple blobs where each blob contains one or more records. In the above example, there is a single blob in a folder labelled "06000000" with the following contents:
44+
45+
![Blob contents](./media/stream-analytics-custom-path-patterns-blob-storage-output/stream-analytics-blob-contents.png)
46+
47+
Notice that each record in the blob has a **client_id** column matching the folder name since the column used to partition the output in the output path was **client_id**.
48+
49+
### Limitations
50+
51+
1. Only one custom partition key is permitted in the Path Pattern blob output property. All of the following Path Patterns are valid:
52+
53+
* cluster1/{date}/{aFieldInMyData}
54+
* cluster1/{time}/{aFieldInMyData}
55+
* cluster1/{aFieldInMyData}
56+
* cluster1/{date}/{time}/{aFieldInMyData}
57+
58+
2. Partition keys are case insensitive, so partition keys like "John" and "john" are equivalent. Also, expressions cannot be used as partition keys. For example, **{columnA + columnB}** does not work.
59+
60+
3. When an input stream consists of records with a partition key cardinality under 8000, the records will be appended to existing blobs and only create new blobs when necessary. If the cardinality is over 8000 there is no guarantee existing blobs will be written to and new blobs won't be created for an arbitrary number of records with the same partition key.
61+
62+
## Custom DateTime path patterns
63+
64+
Custom DateTime path patterns allow you to specify an output format that aligns with Hive Streaming conventions, giving Azure Stream Analytics the ability to send data to Azure HDInsight and Azure Databricks for downstream processing. Custom DateTime path patterns are easily implemented using the `datetime` keyword in the Path Prefix field of your blob output, along with the format specifier. For example, `{datetime:yyyy}`.
1765

1866
Use this link for [Azure Portal](https://portal.azure.com/?Microsoft_Azure_StreamAnalytics_bloboutputcustomdatetimeformats=true) to toggle the feature flag that enables the custom DateTime path patterns for blob storage output preview. This feature will be soon enabled in the main portal.
1967

20-
## Supported tokens
68+
### Supported tokens
2169

2270
The following format specifier tokens can be used alone or in combination to achieve custom DateTime formats:
2371

@@ -37,7 +85,7 @@ If you do not wish to use custom DateTime patterns, you can add the {date} and/o
3785

3886
![Stream Analytics old DateTime formats](./media/stream-analytics-custom-path-patterns-blob-storage-output/stream-analytics-old-date-time-formats.png)
3987

40-
## Extensibility and restrictions
88+
### Extensibility and restrictions
4189

4290
You can use as many tokens, `{datetime:<specifier>}`, as you like in the path pattern until you reach the Path Prefix character limit. Format specifiers can't be combined within a single token beyond the combinations already listed by the date and time dropdowns.
4391

@@ -49,7 +97,7 @@ For a path partition of `logs/MM/dd`:
4997

5098
You may use the same format specifier multiple times in the Path Prefix. The token must be repeated each time.
5199

52-
## Hive Streaming conventions
100+
### Hive Streaming conventions
53101

54102
Custom path patterns for blob storage can be used with the Hive Streaming convention, which expects folders to be labeled with `column=` in the folder name.
55103

0 commit comments

Comments
 (0)
Please sign in to comment.