Skip to content

Files

Latest commit

8b88c67 · Sep 29, 2021

History

History
199 lines (147 loc) · 7.5 KB

data-flow-flatten.md

File metadata and controls

199 lines (147 loc) · 7.5 KB
title titleSuffix description author ms.author ms.review ms.service ms.subservice ms.custom ms.topic ms.date
Flatten transformation in mapping data flow
Azure Data Factory & Azure Synapse
Denormalize hierarchical data using the flatten transformation in Azure Data Factory and Synapse Analytics pipelines.
kromerm
makromer
daperlov
data-factory
data-flows
synapse
conceptual
09/29/2021

Flatten transformation in mapping data flow

[!INCLUDEappliesto-adf-asa-md]

[!INCLUDEdata-flow-preamble]

Use the flatten transformation to take array values inside hierarchical structures such as JSON and unroll them into individual rows. This process is known as denormalization.

[!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/RWLX9j]

Configuration

The flatten transformation contains the following configuration settings

:::image type="content" source="media/data-flow/flatten1.png" alt-text="Flatten settings":::

Unroll by

Select an array to unroll. The output data will have one row per item in each array. If the unroll by array in the input row is null or empty, there will be one output row with unrolled values as null.

Unroll root

By default, the flatten transformation unrolls an array to the top of the hierarchy it exists in. You can optionally select an array as your unroll root. The unroll root must be an array of complex objects that either is or contains the unroll by array. If an unroll root is selected, the output data will contain at least one row per items in the unroll root. If the input row doesn't have any items in the unroll root, it will be dropped from the output data. Choosing an unroll root will always output a less than or equal number of rows than the default behavior.

Flatten mapping

Similar to the select transformation, choose the projection of the new structure from incoming fields and the denormalized array. If a denormalized array is mapped, the output column will be the same data type as the array. If the unroll by array is an array of complex objects that contains subarrays, mapping an item of that subarry will output an array.

Refer to the inspect tab and data preview to verify your mapping output.

Rule-based mapping

The flatten transformation supports rule-based mapping allowing you to create dynamic and flexible transformations that will flatten arrays based on rules and flatten structures based on hierarchy levels.

:::image type="content" source="media/data-flow/flatten-pattern.png" alt-text="Flatten pattern":::

Matching condition

Enter a pattern matching condition for the column or columns that you wish to flatten using either exact matching or patterns. Example: like(name,'cust%')

Deep column traversal

Optional setting that tells the service to handle all subcolumns of a complex object individually instead of handling the complex object as a whole column.

Hierarchy level

Choose the level of the hierarchy that you would like expand.

Name matches (regex)

Optionally choose to express your name matching as a regular expression in this box, instead of using the matching condition above.

Examples

Refer to the following JSON object for the below examples of the flatten transformation

{
  "name":"MSFT","location":"Redmond", "satellites": ["Bay Area", "Shanghai"],
  "goods": {
    "trade":true, "customers":["government", "distributer", "retail"],
    "orders":[
        {"orderId":1,"orderTotal":123.34,"shipped":{"orderItems":[{"itemName":"Laptop","itemQty":20},{"itemName":"Charger","itemQty":2}]}},
        {"orderId":2,"orderTotal":323.34,"shipped":{"orderItems":[{"itemName":"Mice","itemQty":2},{"itemName":"Keyboard","itemQty":1}]}}
    ]}}
{"name":"Company1","location":"Seattle", "satellites": ["New York"],
  "goods":{"trade":false, "customers":["store1", "store2"],
  "orders":[
      {"orderId":4,"orderTotal":123.34,"shipped":{"orderItems":[{"itemName":"Laptop","itemQty":20},{"itemName":"Charger","itemQty":3}]}},
      {"orderId":5,"orderTotal":343.24,"shipped":{"orderItems":[{"itemName":"Chair","itemQty":4},{"itemName":"Lamp","itemQty":2}]}}
    ]}}
{"name": "Company2", "location": "Bellevue",
  "goods": {"trade": true, "customers":["Bank"], "orders": [{"orderId": 4, "orderTotal": 123.34}]}}
{"name": "Company3", "location": "Kirkland"}

No unroll root with string array

Unroll by Unroll root Projection
goods.customers None name
customer = goods.customer

Output

{ 'MSFT', 'government'}
{ 'MSFT', 'distributer'}
{ 'MSFT', 'retail'}
{ 'Company1', 'store'}
{ 'Company1', 'store2'}
{ 'Company2', 'Bank'}
{ 'Company3', null}

No unroll root with complex array

Unroll by Unroll root Projection
goods.orders.shipped.orderItems None name
orderId = goods.orders.orderId
itemName = goods.orders.shipped.orderItems.itemName
itemQty = goods.orders.shipped.orderItems.itemQty
location = location

Output

{ 'MSFT', 1, 'Laptop', 20, 'Redmond'}
{ 'MSFT', 1, 'Charger', 2, 'Redmond'}
{ 'MSFT', 2, 'Mice', 2, 'Redmond'}
{ 'MSFT', 2, 'Keyboard', 1, 'Redmond'}
{ 'Company1', 4, 'Laptop', 20, 'Seattle'}
{ 'Company1', 4, 'Charger', 3, 'Seattle'}
{ 'Company1', 5, 'Chair', 4, 'Seattle'}
{ 'Company1', 5, 'Lamp', 2, 'Seattle'}
{ 'Company2', 4, null, null, 'Bellevue'}
{ 'Company3', null, null, null, 'Kirkland'}

Same root as unroll array

Unroll by Unroll root Projection
goods.orders goods.orders name
goods.orders.shipped.orderItems.itemName
goods.customers
location

Output

{ 'MSFT', ['Laptop','Charger'], ['government','distributer','retail'], 'Redmond'}
{ 'MSFT', ['Mice', 'Keyboard'], ['government','distributer','retail'], 'Redmond'}
{ 'Company1', ['Laptop','Charger'], ['store', 'store2'], 'Seattle'}
{ 'Company1', ['Chair', 'Lamp'], ['store', 'store2'], 'Seattle'}
{ 'Company2', null, ['Bank'], 'Bellevue'}

Unroll root with complex array

Unroll by Unroll root Projection
goods.orders.shipped.orderItem goods.orders name
orderId = goods.orders.orderId
itemName = goods.orders.shipped.orderItems.itemName
itemQty = goods.orders.shipped.orderItems.itemQty
location = location

Output

{ 'MSFT', 1, 'Laptop', 20, 'Redmond'}
{ 'MSFT', 1, 'Charger', 2, 'Redmond'}
{ 'MSFT', 2, 'Mice', 2, 'Redmond'}
{ 'MSFT', 2, 'Keyboard', 1, 'Redmond'}
{ 'Company1', 4, 'Laptop', 20, 'Seattle'}
{ 'Company1', 4, 'Charger', 3, 'Seattle'}
{ 'Company1', 5, 'Chair', 4, 'Seattle'}
{ 'Company1', 5, 'Lamp', 2, 'Seattle'}
{ 'Company2', 4, null, null, 'Bellevue'}

Data flow script

Syntax

<incomingStream>
foldDown(unroll(<unroll cols>),
    mapColumn(
        name,
        each(<array>(type == '<arrayDataType>')),
        each(<array>, match(true())),
        location
    )) ~> <transformationName>

Example

source foldDown(unroll(goods.orders.shipped.orderItems, goods.orders),
    mapColumn(
        name,
        orderId = goods.orders.orderId,
        itemName = goods.orders.shipped.orderItems.itemName,
        itemQty = goods.orders.shipped.orderItems.itemQty,
        location = location
    ),
    skipDuplicateMapInputs: false,
    skipDuplicateMapOutputs: false) 

Next steps