title	description	author	ms.author	ms.service	ms.subservice	ms.topic	ms.date
Parse data transformation in mapping data flow	Parse embedded column documents	kromerm	makromer	data-factory	data-flows	conceptual	02/03/2022

Parse transformation in mapping data flow

[!INCLUDEappliesto-adf-asa-md]

[!INCLUDEdata-flow-preamble]

Use the Parse transformation to parse text columns in your data that are strings in document form. The current supported types of embedded documents that can be parsed are JSON, XML, and delimited text.

[!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/RWykdO]

Configuration

In the parse transformation configuration panel, you'll first pick the type of data contained in the columns that you wish to parse inline. The parse transformation also contains the following configuration settings.

:::image type="content" source="media/data-flow/data-flow-parse-1.png" alt-text="Parse settings":::

Column

Similar to derived columns and aggregates, this is where you'll either modify an exiting column by selecting it from the drop-down picker. Or you can type in the name of a new column here. ADF will store the parsed source data in this column. In most cases, you'll want to define a new column that parses the incoming embedded document string field.

Expression

Use the expression builder to set the source for your parsing. This can be as simple as just selecting the source column with the self-contained data that you wish to parse, or you can create complex expressions to parse.

Example expressions

Source string data: chrome|steel|plastic
- Expression: (desc1 as string, desc2 as string, desc3 as string)
Source JSON data: {"ts":1409318650332,"userId":"309","sessionId":1879,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"free","itemInSession":2,"registration":1384448}
- Expression: (level as string, registration as long)
Source XML data: <Customers><Customer>122</Customer><CompanyName>Great Lakes Food Market</CompanyName></Customers>
- Expression: (Customers as (Customer as integer, CompanyName as string))

Output column type

Here is where you'll configure the target output schema from the parsing that will be written into a single column. The easiest way to set a schema for your output from parsing is to click the 'Detect Type' button on the top right of the expression builder. ADF will attempt to autodetect the schema from the string field which you are parsing and set it for you in the output expression.

:::image type="content" source="media/data-flow/data-flow-parse-2.png" alt-text="Parse example":::

In this example, we have defined parsing of the incoming field "jsonString" which is plain text, but formatted as a JSON structure. We're going to store the parsed results as JSON in a new column called "json" with this schema:

(trade as boolean, customers as string[])

Refer to the inspect tab and data preview to verify your output is mapped properly.

Examples

source(output(
		name as string,
		location as string,
		satellites as string[],
		goods as (trade as boolean, customers as string[], orders as (orderId as string, orderTotal as double, shipped as (orderItems as (itemName as string, itemQty as string)[]))[])
	),
	allowSchemaDrift: true,
	validateSchema: false,
	ignoreNoFilesFound: false,
	documentForm: 'documentPerLine') ~> JsonSource
source(output(
		movieId as string,
		title as string,
		genres as string
	),
	allowSchemaDrift: true,
	validateSchema: false,
	ignoreNoFilesFound: false) ~> CsvSource
JsonSource derive(jsonString = toString(goods)) ~> StringifyJson
StringifyJson parse(json = jsonString ? (trade as boolean,
		customers as string[]),
	format: 'json',
	documentForm: 'arrayOfDocuments') ~> ParseJson
CsvSource derive(csvString = 'Id|name|year\n\'1\'|\'test1\'|\'1999\'') ~> CsvString
CsvString parse(csv = csvString ? (id as integer,
		name as string,
		year as string),
	format: 'delimited',
	columnNamesAsHeader: true,
	columnDelimiter: '|',
	nullValue: '',
	documentForm: 'documentPerLine') ~> ParseCsv
ParseJson select(mapColumn(
		jsonString,
		json
	),
	skipDuplicateMapInputs: true,
	skipDuplicateMapOutputs: true) ~> KeepStringAndParsedJson
ParseCsv select(mapColumn(
		csvString,
		csv
	),
	skipDuplicateMapInputs: true,
	skipDuplicateMapOutputs: true) ~> KeepStringAndParsedCsv

Data flow script

Syntax

Examples

parse(json = jsonString ? (trade as boolean,
                                customers as string[]),
                format: 'json|XML|delimited',
                documentForm: 'singleDocument') ~> ParseJson

parse(csv = csvString ? (id as integer,
                                name as string,
                                year as string),
                format: 'delimited',
                columnNamesAsHeader: true,
                columnDelimiter: '|',
                nullValue: '',
                documentForm: 'documentPerLine') ~> ParseCsv

Next steps

Use the Flatten transformation to pivot rows to columns.
Use the Derived column transformation to transform rows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

data-flow-parse.md

data-flow-parse.md

Parse transformation in mapping data flow

Configuration

Column

Expression

Example expressions

Output column type

Examples

Data flow script

Syntax

Examples

Next steps

Files

data-flow-parse.md

Latest commit

History

data-flow-parse.md

File metadata and controls

Parse transformation in mapping data flow

Configuration

Column

Expression

Example expressions

Output column type

Examples

Data flow script

Syntax

Examples

Next steps