You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/concept-data-ingestion.md
+9-9Lines changed: 9 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -9,20 +9,21 @@ ms.topic: conceptual
9
9
ms.reviewer: nibaccam
10
10
author: nibaccam
11
11
ms.author: nibaccam
12
-
ms.date: 02/25/2020
12
+
ms.date: 02/26/2020
13
13
14
14
---
15
15
16
16
# Data ingestion in Azure Machine Learning
17
17
18
-
In this article, you learn about the pros and cons of the following data ingestion options available with Azure Machine Learning. Depending on your data and data ingestion needs, you can use these options separately, or together as part of your overall data ingestion workflow.
18
+
In this article, you learn the pros and cons of the following data ingestion options available with Azure Machine Learning.
19
19
20
20
1.[Azure Data Factory](#use-azure-data-factory) pipelines
Data ingestion is the process in which unstructured data is extracted from one or multiple sources and then prepared for training machine learning models. It's also time intensive, especially if done manually, and if you have large amounts of data from multiple sources. Automating this effort frees up resources and ensures your models use the most recent and applicable data.
25
24
25
+
We recommend that you evaluate using Azure Data Factory (ADF) initially, as it is specifically built to extract, load, and transform data. If you cannot meet your requirements using ADF, you can use the Python SDK to develop a custom code solution, or use ADF and the Python SDK together to create an overall data ingestion workflow that meets your needs.
26
+
26
27
## Use Azure Data Factory
27
28
28
29
[Azure Data Factory](https://docs.microsoft.com/azure/data-factory/introduction) offers native support for data source monitoring and triggers for data ingestion pipelines.
@@ -32,9 +33,9 @@ The following table summarizes the pros and cons for using Azure Data Factory fo
32
33
|Pros|Cons
33
34
---|---
34
35
Specifically built to extract, load, and transform data.|Currently offers a limited set of Azure Data Factory pipeline tasks
35
-
Allows you to create data-driven workflows for orchestrating data movement and transformations at scale.|If invoking external Web APIs, like the web activity to trigger an Azure Pipeline, there's no out-of-the-box approach for making the task wait for a result to move forward with the flow
36
-
Natively supports data source triggered data ingestion| Expensive to construct and maintain
36
+
Allows you to create data-driven workflows for orchestrating data movement and transformations at scale.|Expensive to construct and maintain. See Azure Data Factory's [pricing page](https://azure.microsoft.com/pricing/details/data-factory/data-pipeline/) for more information.
37
37
Integrated with various Azure tools like [Azure Databricks](https://docs.microsoft.com/azure/data-factory/transform-data-using-databricks-notebook) and [Azure Functions](https://docs.microsoft.com/azure/data-factory/control-flow-azure-function-activity) | Doesn't natively run scripts, instead relies on separate compute for script runs
38
+
Natively supports data source triggered data ingestion|
38
39
Data preparation and model training processes are separate.|
39
40
Embedded data lineage capability for Azure Data Factory dataflows|
40
41
Provides a low code experience [user interface](https://docs.microsoft.com/azure/data-factory/quickstart-create-data-factory-portal) for non-scripting approaches |
@@ -58,13 +59,12 @@ Pros| Cons
58
59
---|---
59
60
Configure your own Python scripts | Does not natively support data source change triggering. Requires Logic App or Azure Function implementations
60
61
Data preparation as part of every model training execution|Requires development skills to create a data ingestion script
61
-
||Requires engineering practices to guarantee code quality and effectiveness
62
-
||Does not provide a user interface for creating the ingestion mechanism
62
+
Supports data preparation scripts on various compute targets, including [Azure Machine Learning compute](concept-compute-target#azure-machine-learning-compute-managed.md) |Does not provide a user interface for creating the ingestion mechanism
63
63
64
-
In the following diagram, the Azure Machine Learning pipeline consists of two steps: data ingestion and model training. The data ingestion step encompasses tasks that can be accomplished using Python libraries and the SDK, such as extracting the data from local/web sources, and basic data transformations, like missing value imputation. The training step then uses the prepared data as input to train your machine learning model.
64
+
In the following diagram, the Azure Machine Learning pipeline consists of two steps: data ingestion and model training. The data ingestion step encompasses tasks that can be accomplished using Python libraries and the Python SDK, such as extracting data from local/web sources, and basic data transformations, like missing value imputation. The training step then uses the prepared data as input to your training script to train your machine learning model.
65
65
66
66

67
67
68
68
## Next steps
69
69
70
-
*Automate and schedule data ingestion updates using[Azure Pipelines for data ingestion](how-to-cicd-data-ingestion.md).
70
+
*Learn how to automate and manage the development lifecycle of data ingestion pipelines with[Azure Pipelines](how-to-cicd-data-ingestion.md).
0 commit comments