title	titleSuffix	description	services	ms.service	ms.subservice	ms.custom	ms.topic	ms.author	author	ms.reviewer	ms.date
Secure network traffic flow	Azure Machine Learning	Learn how network traffic flows between components when your Azure Machine Learning workspace is in a secured virtual network.	machine-learning	machine-learning	enterprise-readiness	event-tier1-build-2022	conceptual	jhirono	jhirono	larryfr	04/08/2022

Network traffic flow when using a secured workspace

When your Azure Machine Learning workspace and associated resources are secured in an Azure Virtual Network, it changes the network traffic between resources. Without a virtual network, network traffic flows over the public internet or within an Azure data center. Once a virtual network (VNet) is introduced, you may also want to harden network security. For example, blocking inbound and outbound communications between the VNet and public internet. However, Azure Machine Learning requires access to some resources on the public internet. For example, Azure Resource Management is used for deployments and management operations.

This article lists the required traffic to/from the public internet. It also explains how network traffic flows between your client development environment and a secured Azure Machine Learning workspace in the following scenarios:

Using Azure Machine Learning studio to work with:
- Your workspace
- AutoML
- Designer
- Datasets and datastores
[!TIP] Azure Machine Learning studio is a web-based UI that runs partially in your web browser, and makes calls to Azure services to perform tasks such as training a model, using designer, or viewing datasets. Some of these calls use a different communication flow than if you are using the SDK, CLI, REST API, or VS Code.
Using Azure Machine Learning studio, SDK, CLI, or REST API to work with:
- Compute instances and clusters
- Azure Kubernetes Service
- Docker images managed by Azure Machine Learning

Tip

If a scenario or task is not listed here, it should work the same with or without a secured workspace.

Assumptions

This article assumes the following configuration:

Azure Machine Learning workspace using a private endpoint to communicate with the VNet.
The Azure Storage Account, Key Vault, and Container Registry used by the workspace also use a private endpoint to communicate with the VNet.
A VPN gateway or Express Route is used by the client workstations to access the VNet.

Inbound and outbound requirements

Scenario	Required inbound	Required outbound	Additional configuration
Access workspace from studio	NA	Azure Active Directory Azure Front Door Azure Machine Learning service	You may need to use a custom DNS server. For more information, see Use your workspace with a custom DNS.
Use AutoML, designer, dataset, and datastore from studio	NA	NA	Workspace service principal configuration Allow access from trusted Azure services For more information, see How to secure a workspace in a virtual network.
Use compute instance and compute cluster	Azure Machine Learning service on port 44224 Azure Batch Management service on ports 29876-29877	Azure Active Directory Azure Resource Manager Azure Machine Learning service Azure Storage Account Azure Key Vault	If you use a firewall, create user-defined routes. For more information, see Configure inbound and outbound traffic.
Use Azure Kubernetes Service	NA	For information on the outbound configuration for AKS, see How to deploy to Azure Kubernetes Service.	Configure the Internal Load Balancer. For more information, see How to deploy to Azure Kubernetes Service.
Use Docker images managed by Azure Machine Learning	NA	Microsoft Container Registry `viennaglobal.azurecr.io` global container registry	If the Azure Container Registry for your workspace is behind the VNet, configure the workspace to use a compute cluster to build images. For more information, see How to secure a workspace in a virtual network.

Important

Azure Machine Learning uses multiple storage accounts. Each stores different data, and has a different purpose:

Your storage: The Azure Storage Account(s) in your Azure subscription are used to store your data and artifacts such as models, training data, training logs, and Python scripts. For example, the default storage account for your workspace is in your subscription. The Azure Machine Learning compute instance and compute clusters access file and blob data in this storage over ports 445 (SMB) and 443 (HTTPS).

When using a compute instance or compute cluster, your storage account is mounted as a file share using the SMB protocol. The compute instance and cluster use this file share to store the data, models, Jupyter notebooks, datasets, etc. The compute instance and cluster use the private endpoint when accessing the storage account.
Microsoft storage: The Azure Machine Learning compute instance and compute clusters rely on Azure Batch, and access storage located in a Microsoft subscription. This storage is used only for the management of the compute instance/cluster. None of your data is stored here. The compute instance and compute cluster access the blob, table, and queue data in this storage, using port 443 (HTTPS).

Machine Learning also stores metadata in an Azure Cosmos DB instance. By default, this instance is hosted in a Microsoft subscription and managed by Microsoft. You can optionally use an Azure Cosmos DB instance in your Azure subscription. For more information, see Data encryption with Azure Machine Learning.

Scenario: Access workspace from studio

Note

The information in this section is specific to using the workspace from the Azure Machine Learning studio. If you use the Azure Machine Learning SDK, REST API, CLI, or Visual Studio Code, the information in this section does not apply to you.

When accessing your workspace from studio, the network traffic flows are as follows:

To authenticate to resources, Azure Active Directory is used.
For management and deployment operations, Azure Resource Manager is used.
For Azure Machine Learning specific tasks, Azure Machine Learning service is used
For access to Azure Machine Learning studio (https://ml.azure.com), Azure FrontDoor is used.
For most storage operations, traffic flows through the private endpoint of the default storage for your workspace. Exceptions are discussed in the Use AutoML, designer, dataset, and datastore section.
You also need to configure a DNS solution that allows you to resolve the names of the resources within the VNet. For more information, see Use your workspace with a custom DNS.

:::image type="content" source="./media/concept-secure-network-traffic-flow/workspace-traffic-studio.png" alt-text="Diagram of network traffic between client and workspace when using studio":::

Scenario: Use AutoML, designer, dataset, and datastore from studio

The following features of Azure Machine Learning studio use data profiling:

Dataset: Explore the dataset from studio.
Designer: Visualize module output data.
AutoML: View a data preview/profile and choose a target column.
Labeling

Data profiling depends on the Azure Machine Learning managed service being able to access the default Azure Storage Account for your workspace. The managed service doesn't exist in your VNet, so can’t directly access the storage account in the VNet. Instead, the workspace uses a service principal to access storage.

Tip

You can provide a service principal when creating the workspace. If you do not, one is created for you and will have the same name as your workspace.

To allow access to the storage account, configure the storage account to allow a resource instance for your workspace or select the Allow Azure services on the trusted services list to access this storage account. This setting allows the managed service to access storage through the Azure data center network.

Next, add the service principal for the workspace to the Reader role to the private endpoint of the storage account. This role is used to verify the workspace and storage subnet information. If they're the same, access is allowed. Finally, the service principal also requires Blob data contributor access to the storage account.

For more information, see the Azure Storage Account section of How to secure a workspace in a virtual network.

:::image type="content" source="./media/concept-secure-network-traffic-flow/storage-traffic-studio.png" alt-text="Diagram of traffic between client, data profiling, and storage":::

Scenario: Use compute instance and compute cluster

Azure Machine Learning compute instance and compute cluster are managed services hosted by Microsoft. They're built on top of the Azure Batch service. While they exist in a Microsoft managed environment, they're also injected into your VNet.

When you create a compute instance or compute cluster, the following resources are also created in your VNet:

A Network Security Group with required outbound rules. These rules allow inbound access from the Azure Machine Learning (TCP on port 44224) and Azure Batch service (TCP on ports 29876-29877).

[!IMPORTANT] If you usee a firewall to block internet access into the VNet, you must configure the firewall to allow this traffic. For example, with Azure Firewall you can create user-defined routes. For more information, see How to use Azure Machine Learning with a firewall.
A load balancer with a public IP.

Also allow outbound access to the following service tags. For each tag, replace region with the Azure region of your compute instance/cluster:

Storage.region - This outbound access is used to connect to the Azure Storage Account inside the Azure Batch service-managed VNet.
Keyvault.region - This outbound access is used to connect to the Azure Key Vault account inside the Azure Batch service-managed VNet.

Data access from your compute instance or cluster goes through the private endpoint of the Storage Account for your VNet.

If you use Visual Studio Code on a compute instance, you must allow other outbound traffic. For more information, see How to use Azure Machine Learning with a firewall.

:::image type="content" source="./media/concept-secure-network-traffic-flow/compute-instance-and-cluster.png" alt-text="Diagram of traffic flow when using compute instance or cluster":::

Scenario: Use online endpoints

Securing an online endpoint with a private endpoint is a preview feature.

[!INCLUDE preview disclaimer]

Inbound communication with the scoring URL of the online endpoint can be secured using the public_network_access flag on the endpoint. Setting the flag to disabled restricts the online endpoint to receiving traffic only from the virtual network. For secure inbound communications, the Azure Machine Learning workspace's private endpoint is used.

Outbound communication from a deployment can be secured on a per-deployment basis by using the egress_public_network_access flag. Outbound communication in this case is from the deployment to Azure Container Registry, storage blob, and workspace. Setting the flag to true will restrict communication with these resources to the virtual network.

Note

For secure outbound communication, a private endpoint is created for each deployment where egress_public_network_access is set to disabled.

Visibility of the endpoint is also governed by the public_network_access flag of the Azure Machine Learning workspace. If this flag is disabled, then the scoring endpoints can only be accessed from virtual networks that contain a private endpoint for the workspace. If it is enabled, then the scoring endpoint can be accessed from the virtual network and public networks.

Supported configurations

Configuration	Inbound (Endpoint property)	Outbound (Deployment property)	Supported?
secure inbound with secure outbound	`public_network_access` is disabled	`egress_public_network_access` is disabled	Yes
secure inbound with public outbound	`public_network_access` is disabled	`egress_public_network_access` is enabled	Yes
public inbound with secure outbound	`public_network_access` is enabled	`egress_public_network_access` is disabled	Yes
public inbound with public outbound	`public_network_access` is enabled	`egress_public_network_access` is enabled	Yes

Scenario: Use Azure Kubernetes Service

For information on the outbound configuration required for Azure Kubernetes Service, see the connectivity requirements section of How to deploy to Azure Kubernetes Service.

Note

The Azure Kubernetes Service load balancer is not the same as the load balancer created by Azure Machine Learning. If you want to host your model as a secured application, only available on the VNet, use the internal load balancer created by Azure Machine Learning. If you want to allow public access, use the public load balancer created by Azure Machine Learning.

If your model requires extra inbound or outbound connectivity, such as to an external data source, use a network security group or your firewall to allow the traffic.

Scenario: Use Docker images managed by Azure ML

Azure Machine Learning provides Docker images that can be used to train models or perform inference. If you don't specify your own images, the ones provided by Azure Machine Learning are used. These images are hosted on the Microsoft Container Registry (MCR). They're also hosted on a geo-replicated Azure Container Registry named viennaglobal.azurecr.io.

If you provide your own docker images, such as on an Azure Container Registry that you provide, you don't need the outbound communication with MCR or viennaglobal.azurecr.io.

Tip

If your Azure Container Registry is secured in the VNet, it cannot be used by Azure Machine Learning to build Docker images. Instead, you must designate an Azure Machine Learning compute cluster to build images. For more information, see How to secure a workspace in a virtual network.

:::image type="content" source="./media/concept-secure-network-traffic-flow/azure-machine-learning-docker-images.png" alt-text="Diagram of traffic flow when using provided Docker images":::

Next steps

Now that you've learned how network traffic flows in a secured configuration, learn more about securing Azure ML in a virtual network by reading the Virtual network isolation and privacy overview article.

For information on best practices, see the Azure Machine Learning best practices for enterprise security article.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

concept-secure-network-traffic-flow.md

concept-secure-network-traffic-flow.md

Network traffic flow when using a secured workspace

Assumptions

Inbound and outbound requirements

Scenario: Access workspace from studio

Scenario: Use AutoML, designer, dataset, and datastore from studio

Scenario: Use compute instance and compute cluster

Scenario: Use online endpoints

Supported configurations

Scenario: Use Azure Kubernetes Service

Scenario: Use Docker images managed by Azure ML

Next steps

Files

concept-secure-network-traffic-flow.md

Latest commit

History

concept-secure-network-traffic-flow.md

File metadata and controls

Network traffic flow when using a secured workspace

Assumptions

Inbound and outbound requirements

Scenario: Access workspace from studio

Scenario: Use AutoML, designer, dataset, and datastore from studio

Scenario: Use compute instance and compute cluster

Scenario: Use online endpoints

Supported configurations

Scenario: Use Azure Kubernetes Service

Scenario: Use Docker images managed by Azure ML

Next steps