Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit a43e620

Browse files
committedFeb 26, 2020
new image + peer feedback
1 parent 9476c85 commit a43e620

File tree

2 files changed

+270
-286
lines changed

2 files changed

+270
-286
lines changed
 

‎articles/machine-learning/concept-data-ingestion.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,20 +9,21 @@ ms.topic: conceptual
99
ms.reviewer: nibaccam
1010
author: nibaccam
1111
ms.author: nibaccam
12-
ms.date: 02/25/2020
12+
ms.date: 02/26/2020
1313

1414
---
1515

1616
# Data ingestion in Azure Machine Learning
1717

18-
In this article, you learn about the pros and cons of the following data ingestion options available with Azure Machine Learning. Depending on your data and data ingestion needs, you can use these options separately, or together as part of your overall data ingestion workflow.
18+
In this article, you learn the pros and cons of the following data ingestion options available with Azure Machine Learning.
1919

2020
1. [Azure Data Factory](#use-azure-data-factory) pipelines
2121
2. [Azure Machine Learning Python SDK](#use-the-python-sdk)
2222

23-
2423
Data ingestion is the process in which unstructured data is extracted from one or multiple sources and then prepared for training machine learning models. It's also time intensive, especially if done manually, and if you have large amounts of data from multiple sources. Automating this effort frees up resources and ensures your models use the most recent and applicable data.
2524

25+
We recommend that you evaluate using Azure Data Factory (ADF) initially, as it is specifically built to extract, load, and transform data. If you cannot meet your requirements using ADF, you can use the Python SDK to develop a custom code solution, or use ADF and the Python SDK together to create an overall data ingestion workflow that meets your needs.
26+
2627
## Use Azure Data Factory
2728

2829
[Azure Data Factory](https://docs.microsoft.com/azure/data-factory/introduction) offers native support for data source monitoring and triggers for data ingestion pipelines.
@@ -32,9 +33,9 @@ The following table summarizes the pros and cons for using Azure Data Factory fo
3233
|Pros|Cons
3334
---|---
3435
Specifically built to extract, load, and transform data.|Currently offers a limited set of Azure Data Factory pipeline tasks
35-
Allows you to create data-driven workflows for orchestrating data movement and transformations at scale.|If invoking external Web APIs, like the web activity to trigger an Azure Pipeline, there's no out-of-the-box approach for making the task wait for a result to move forward with the flow
36-
Natively supports data source triggered data ingestion| Expensive to construct and maintain
36+
Allows you to create data-driven workflows for orchestrating data movement and transformations at scale.|Expensive to construct and maintain. See Azure Data Factory's [pricing page](https://azure.microsoft.com/pricing/details/data-factory/data-pipeline/) for more information.
3737
Integrated with various Azure tools like [Azure Databricks](https://docs.microsoft.com/azure/data-factory/transform-data-using-databricks-notebook) and [Azure Functions](https://docs.microsoft.com/azure/data-factory/control-flow-azure-function-activity) | Doesn't natively run scripts, instead relies on separate compute for script runs
38+
Natively supports data source triggered data ingestion|
3839
Data preparation and model training processes are separate.|
3940
Embedded data lineage capability for Azure Data Factory dataflows|
4041
Provides a low code experience [user interface](https://docs.microsoft.com/azure/data-factory/quickstart-create-data-factory-portal) for non-scripting approaches |
@@ -58,13 +59,12 @@ Pros| Cons
5859
---|---
5960
Configure your own Python scripts | Does not natively support data source change triggering. Requires Logic App or Azure Function implementations
6061
Data preparation as part of every model training execution|Requires development skills to create a data ingestion script
61-
||Requires engineering practices to guarantee code quality and effectiveness
62-
||Does not provide a user interface for creating the ingestion mechanism
62+
Supports data preparation scripts on various compute targets, including [Azure Machine Learning compute](concept-compute-target#azure-machine-learning-compute-managed.md) |Does not provide a user interface for creating the ingestion mechanism
6363

64-
In the following diagram, the Azure Machine Learning pipeline consists of two steps: data ingestion and model training. The data ingestion step encompasses tasks that can be accomplished using Python libraries and the SDK, such as extracting the data from local/web sources, and basic data transformations, like missing value imputation. The training step then uses the prepared data as input to train your machine learning model.
64+
In the following diagram, the Azure Machine Learning pipeline consists of two steps: data ingestion and model training. The data ingestion step encompasses tasks that can be accomplished using Python libraries and the Python SDK, such as extracting data from local/web sources, and basic data transformations, like missing value imputation. The training step then uses the prepared data as input to your training script to train your machine learning model.
6565

6666
![Azure pipeline + SDK data ingestion](media/concept-data-ingestion/data-ingest-option-two.png)
6767

6868
## Next steps
6969

70-
* Automate and schedule data ingestion updates using [Azure Pipelines for data ingestion](how-to-cicd-data-ingestion.md).
70+
* Learn how to automate and manage the development lifecycle of data ingestion pipelines with [Azure Pipelines](how-to-cicd-data-ingestion.md).

‎articles/machine-learning/media/concept-data-ingestion/data-ingest-option-one.svg

Lines changed: 261 additions & 277 deletions
Loading

0 commit comments

Comments
 (0)
Please sign in to comment.