Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit e0d5de7

Browse files
committedMay 8, 2020
Add diagram
1 parent 6a5952a commit e0d5de7

File tree

2 files changed

+209
-2
lines changed

2 files changed

+209
-2
lines changed
 

‎articles/machine-learning/concept-distributed-training.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ms.date: 03/27/2020
1515

1616
In this article, you learn about distributed training and how Azure Machine Learning supports it for deep learning models.
1717

18-
In distributed training the workload to train a model is split up and shared among multiple mini processors, called worker nodes. These worker nodes work in parallel to speed up model training. Distributed training can be used for traditional ML models, but is better suited for compute and time intensive tasks, like [deep learning](concept-deep-learning-vs-machine-learning.md) for training deep neural networks.
18+
In distributed training the workload to train a model is split up and shared among multiple mini processors, called worker nodes. These worker nodes work in parallel to speed up model training. Distributed training can be used for traditional ML models, but is better suited for compute and time intensive tasks, like [deep learning](concept-deep-learning-vs-machine-learning.md) for training deep neural networks.
1919

2020
## Deep learning and distributed training
2121

@@ -31,7 +31,9 @@ For ML models that don't require distributed training, see [train models with Az
3131

3232
Data parallelism is the easiest to implement of the two distributed training approaches, and is sufficient for most use cases.
3333

34-
In this approach, the data is divided into partitions, where the number of partitions is equal to the total number of available nodes, in the compute cluster. The model is copied in each of these worker nodes, and each worker operates on its own subset of the data. Keep in mind that each node has to have the capacity to support the model that's being trained, that is the model has to entirely fit on each node.
34+
In this approach, the data is divided into partitions, where the number of partitions is equal to the total number of available nodes, in the compute cluster. The model is copied in each of these worker nodes, and each worker operates on its own subset of the data. Keep in mind that each node has to have the capacity to support the model that's being trained, that is the model has to entirely fit on each node. The following diagram provides a visual demonstration of this approach.
35+
36+
![Data-parallelism-concept-diagram](./media/concept-distributed-training/distributed-training.svg)
3537

3638
Each node independently computes the errors between its predictions for its training samples and the labeled outputs. In turn, each node updates its model based on the errors and must communicate all of its changes to the other nodes to update their corresponding models. This means that the worker nodes need to synchronize the model parameters, or gradients, at the end of the batch computation to ensure they are training a consistent model.
3739

Lines changed: 205 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)
Please sign in to comment.