Skip to content

Files

121 lines (83 loc) · 10.9 KB

concept-vulnerability-management.md

File metadata and controls

121 lines (83 loc) · 10.9 KB
title titleSuffix description ms.reviewer author ms.author ms.date ms.topic ms.service ms.subservice ms.custom
Vulnerability management
Azure Machine Learning
Learn how Azure Machine Learning manages vulnerabilities in images provided by the service, and how you can keep components that are managed by you up to date with the latest security updates.
larryfr
deeikele
deeikele
12/16/2021
conceptual
machine-learning
enterprise-readiness
event-tier1-build-2022

Vulnerability management for Azure Machine Learning

Vulnerability management involves detecting, assessing, mitigating, and reporting on any security vulnerabilities that exist in an organization’s systems and software. Vulnerability management is a shared responsibility between you and Microsoft.

In this article, we discuss these responsibilities and outline the vulnerability management controls provided by Azure Machine Learning. You'll learn how to keep your service instance and applications up to date with the latest security updates, and how to minimize the window of opportunity for attackers.

Microsoft-managed VM images

Azure Machine Learning manages host OS VM images for Azure ML compute instance, Azure ML compute clusters, and Data Science Virtual Machines. The update frequency is monthly and includes the following:

  • For each new VM image version, the latest updates are sourced from the original publisher of the OS. Using the latest updates ensures that all OS-related patches that are applicable are picked. For Azure Machine Learning, the publisher is Canonical for all the Ubuntu 18 images. These images are used for Azure Machine Learning compute instances, compute clusters, and Data Science Virtual Machines.
  • VM images are updated monthly.
  • In addition to patches applied by the original publisher, Azure Machine Learning updates system packages when updates are available.
  • Azure Machine Learning checks and validates any machine learning packages that may require an upgrade. In most circumstances, new VM images contain the latest package versions.
  • All VM images are built on secure subscriptions that run vulnerability scanning regularly. Any unaddressed vulnerabilities are flagged and are to be fixed within the next release.
  • The frequency is on a monthly interval for most images. For compute instance, the image release is aligned with the Azure ML SDK release cadence as it comes preinstalled in the environment.

Next to the regular release cadence, hot fixes are applied in the case vulnerabilities are discovered. Hot fixes get rolled out within 72 hours for Azure ML compute and within a week for Compute Instance.

Note

The host OS is not the OS version you might specify for an environment when training or deploying a model. Environments run inside Docker. Docker runs on the host OS.

Microsoft-managed container images

Base docker images maintained by Azure Machine Learning get security patches frequently to address newly discovered vulnerabilities.

Azure Machine Learning releases updates for supported images every two weeks to address vulnerabilities. As a commitment, we aim to have no vulnerabilities older than 30 days in the latest version of supported images.

Patched images are released under new immutable tag and also updated :latest tag. Using the :latest tag or pinning to a particular image version may be a trade-off of security and environment reproducibility for your machine learning job.

Managing environments and container images

Reproducibility is a key aspect of software development and machine learning experimentation. Azure Machine Learning Environment component’s primary focus is to guarantee reproducibility of the environment where user's code gets executed. To ensure reproducibility for any machine learning job, earlier built images will be pulled to the compute nodes without a need of rematerialization.

While Azure Machine Learning patches base images with each release, whether you use the latest image may be tradeoff between reproducibility and vulnerability management. So, it's your responsibility to choose the environment version used for your jobs or model deployments.

By default, dependencies are layered on top of base images provided by Azure ML when building environments. You can also use your own base images when using environments in Azure Machine Learning. Once you install more dependencies on top of the Microsoft-provided images, or bring your own base images, vulnerability management becomes your responsibility.

Associated to your Azure Machine Learning workspace is an Azure Container Registry instance that's used as a cache for container images. Any image materialized, is pushed to the container registry, and used if experimentation or deployment is triggered for the corresponding environment. Azure Machine Learning doesn't delete any image from your container registry, and it's your responsibility to evaluate the need of an image over time. To monitor and maintain environment hygiene, you can use Microsoft Defender for Container Registry to help scan your images for vulnerabilities. To automate your processes based on triggers from Microsoft Defender, see Automate responses to Microsoft Defender for Cloud triggers.

Vulnerability management on compute hosts

Managed compute nodes in Azure Machine Learning make use of Microsoft-managed OS VM images and pull the latest updated VM image at the time that a node gets provisioned. This applies to compute instance, compute cluster, and managed inference compute SKUs. While OS VM images are regularly patched, compute nodes are not actively scanned for vulnerabilities while in use. For an extra layer of protection, consider network isolation of your compute.
It's a shared responsibility between you and Microsoft to ensure that your environment is up-to-date and compute nodes use the latest OS version. Nodes that are non-idle can't get updated to the latest VM image. Considerations are slightly different for each compute type, as listed in the following sections.

Compute instance

  • Compute instances get latest VM images at time of provisioning.

  • Microsoft doesn't provide active OS patching for compute instance. To obtain the latest VM image, delete and recreate the compute instance.

  • You could use set up scripts to install extra scanning software. Azure Defender agents are currently not supported.

  • To query resource age, you could use the following log analytics query:

    AmlComputeClusterEvent 
    | where ClusterType == "DSI" and EventType =="CreateOperationCompleted" and split(_ResourceId, "/")[-1]=="<wsname>" 
    | project ClusterName, TimeCreated=TimeGenerated 
    | summarize Last_Time_Created=arg_max(TimeCreated, *) by ClusterName 
    | join kind = leftouter (AmlComputeClusterEvent
        | where ClusterType == "DSI" and EventType =="DeleteOperationCompleted"  
        | project ClusterName, TimeGenerated 
        | summarize Last_Time_Deleted=arg_max(TimeGenerated, *) by ClusterName) 
        on ClusterName  
    | where (Last_Time_Created>Last_Time_Deleted or isnull(Last_Time_Deleted)) and Last_Time_Created < ago(30days) 
    | project ClusterName, Last_Time_Created, Last_Time_Deleted 

Compute clusters

Compute clusters automatically upgrade to the latest VM image. If the cluster is configured with min nodes = 0, it automatically upgrades nodes to the latest VM image version when all jobs are completed and the cluster reduces to zero nodes.

  • There are conditions in which cluster nodes do not scale down, and as a result are unable to get the latest VM images.

    • Cluster minimum node count may be set to a value greater than 0.
    • Jobs may be scheduled continuously on your cluster.
  • It is your responsibility to scale non-idle cluster nodes down to get the latest OS VM image updates. Azure Machine Learning does not abort any running workloads on compute nodes to issue VM updates.

    • Temporarily change the minimum nodes to zero and allow the cluster to reduce to zero nodes.

Managed online endpoints

  • Managed Online Endpoints automatically receive OS host image updates that include vulnerability fixes. The update frequency of images is at least once a month.
  • Compute nodes get automatically upgraded to the latest VM image version once released. There’s no action required on you.

Customer managed Kubernetes clusters

Kubernetes compute lets you configure Kubernetes clusters to train, inference, and manage models in Azure Machine Learning.

  • Because you manage the environment with Kubenetes, both OS VM vulnerabilities and container image vulnerability management is your responsibility.
  • Azure Machine Learning frequently publishes new versions of AzureML extension container images into Microsoft Container Registry. It's Microsoft’s responsibility to ensure new image versions are free from vulnerabilities. Vulnerabilities are fixed with each release.
  • When your clusters run jobs without interruption, running jobs may run outdated container image versions. Once you upgrade the amlarc extension to a running cluster, newly submitted jobs will start to use the latest image version. When upgrading the AMLArc extension to its latest version, clean up the old container image versions from the clusters as required.
  • Observability on whether your Azure Arc cluster is running the latest version of AMLArc, you can find via the Azure portal. Under your Arc resource of the type 'Kubernetes - Azure Arc', see 'Extensions' to find the version of the AMLArc extension.

Automated ML and Designer environments

For code-based training experiences, you control which Azure Machine Learning environment is used. With AutoML and Designer, the environment is encapsulated as part of the service. These types of jobs can run on computes configured by you, allowing for extra controls such as network isolation.

  • Automated ML jobs run on environments that layer on top of Azure ML base docker images.

  • Designer jobs are compartmentalized into Components. Each component has its own environment that layers on top of the Azure ML base docker images. For more information on components, see the Component reference.

Next steps