title	titleSuffix	description	author	ms.topic	ms.author	ms.service	ms.subservice	ms.custom	ms.date
Optimizing source performance in mapping data flow	Azure Data Factory & Azure Synapse	Learn about optimizing source performance in mapping data flows in Azure Data Factory and Azure Synapse Analytics pipelines.	kromerm	conceptual	makromer	data-factory	data-flows	synapse	09/29/2021

Optimizing sources

For every source except Azure SQL Database, it is recommended that you keep Use current partitioning as the selected value. When reading from all other source systems, data flows automatically partitions data evenly based upon the size of the data. A new partition is created for about every 128 MB of data. As your data size increases, the number of partitions increase.

Any custom partitioning happens after Spark reads in the data and will negatively impact your data flow performance. As the data is evenly partitioned on read, this is not recommended.

Note

Read speeds can be limited by the throughput of your source system.

Azure SQL Database sources

Azure SQL Database has a unique partitioning option called 'Source' partitioning. Enabling source partitioning can improve your read times from Azure SQL DB by enabling parallel connections on the source system. Specify the number of partitions and how to partition your data. Use a partition column with high cardinality. You can also enter a query that matches the partitioning scheme of your source table.

Tip

For source partitioning, the I/O of the SQL Server is the bottleneck. Adding too many partitions may saturate your source database. Generally four or five partitions is ideal when using this option.

:::image type="content" source="media/data-flow/sourcepart3.png" alt-text="Source partitioning":::

Isolation level

The isolation level of the read on an Azure SQL source system has an impact on performance. Choosing 'Read uncommitted' will provide the fastest performance and prevent any database locks. To learn more about SQL Isolation levels, please see Understanding isolation levels.

Read using query

You can read from Azure SQL Database using a table or a SQL query. If you are executing a SQL query, the query must complete before transformation can start. SQL Queries can be useful to push down operations that may execute faster and reduce the amount of data read from a SQL Server such as SELECT, WHERE, and JOIN statements. When pushing down operations, you lose the ability to track lineage and performance of the transformations before the data comes into the data flow.

Azure Synapse Analytics sources

When using Azure Synapse Analytics, a setting called Enable staging exists in the source options. This allows the service to read from Synapse using Staging, which greatly improves read performance by using the Synapse COPY statement command for the most performant bulk loading capability. Enabling Staging requires you to specify an Azure Blob Storage or Azure Data Lake Storage gen2 staging location in the data flow activity settings.

:::image type="content" source="media/data-flow/enable-staging.png" alt-text="Enable staging":::

File-based sources

While data flows support a variety of file types, the Spark-native Parquet format is recommended for optimal read and write times.

If you're running the same data flow on a set of files, we recommend reading from a folder, using wildcard paths or reading from a list of files. A single data flow activity run can process all of your files in batch. More information on how to configure these settings can be found in the Source transformation section of the Azure Blob Storage connector documentation.

If possible, avoid using the For-Each activity to run data flows over a set of files. This will cause each iteration of the for-each to spin up its own Spark cluster, which is often not necessary and can be expensive.

Next steps

Data flow performance overview
Optimizing sinks
Optimizing transformations
Using data flows in pipelines

See other Data Flow articles related to performance:

Data Flow activity
Monitor Data Flow performance
Integration Runtime performance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

concepts-data-flow-performance-sources.md

concepts-data-flow-performance-sources.md

Optimizing sources

Azure SQL Database sources

Isolation level

Read using query

Azure Synapse Analytics sources

File-based sources

Next steps

Files

concepts-data-flow-performance-sources.md

Latest commit

History

concepts-data-flow-performance-sources.md

File metadata and controls

Optimizing sources

Azure SQL Database sources

Isolation level

Read using query

Azure Synapse Analytics sources

File-based sources

Next steps