title	titleSuffix	description	services	ms.service	ms.subservice	ms.custom	ms.topic	author	ms.author	ms.date	ms.reviewer
CLI (v2) Azure Arc-enabled Kubernetes online deployment YAML schema	Azure Machine Learning	Reference documentation for the CLI (v2) Azure Arc-enabled Kubernetes online deployment YAML schema.	machine-learning	machine-learning	mlops	event-tier1-build-2022	reference	Bozhong68	bozhlin	03/31/2022	nibaccam

CLI (v2) Azure Arc-enabled Kubernetes online deployment YAML schema

[!INCLUDE cli v2]

The source JSON schema can be found at https://azuremlschemas.azureedge.net/latest/kubernetesOnlineDeployment.schema.json.

[!INCLUDE schema note]

YAML syntax

Key	Type	Description	Allowed values	Default value
`$schema`	string	The YAML schema. If you use the Azure Machine Learning VS Code extension to author the YAML file, including `$schema` at the top of your file enables you to invoke schema and resource completions.
`name`	string	Required. Name of the deployment. Naming rules are defined here.
`description`	string	Description of the deployment.
`tags`	object	Dictionary of tags for the deployment.
`endpoint_name`	string	Required. Name of the endpoint to create the deployment under.
`model`	string or object	The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification. To reference an existing model, use the `azureml:<model-name>:<model-version>` syntax. To define a model inline, follow the Model schema. As a best practice for production scenarios, you should create the model separately and reference it here. This field is optional for custom container deployment scenarios.
`model_mount_path`	string	The path to mount the model in a custom container. Applicable only for custom container deployment scenarios. If the `model` field is specified, it is mounted on this path in the container.
`code_configuration`	object	Configuration for the scoring code logic. This field is optional for custom container deployment scenarios.
`code_configuration.code`	string	Local path to the source code directory for scoring the model.
`code_configuration.scoring_script`	string	Relative path to the scoring file in the source code directory.
`environment_variables`	object	Dictionary of environment variable key-value pairs to set in the deployment container. You can access these environment variables from your scoring scripts.
`environment`	string or object	Required. The environment to use for the deployment. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification. To reference an existing environment, use the `azureml:<environment-name>:<environment-version>` syntax. To define an environment inline, follow the Environment schema. As a best practice for production scenarios, you should create the environment separately and reference it here.
`instance_type`	string	The instance type used to place the inference workload. If omitted, the inference workload will be placed on the default instance type of the Kubernetes cluster specified in the endpoint's `compute` field. If specified, the inference workload will be placed on that selected instance type. Note that the set of instance types for a Kubernetes cluster is configured via the Kubernetes cluster custom resource definition (CRD), hence they are not part of the Azure ML YAML schema for attaching Kubernetes compute.For more information, see Create and select Kubernetes instance types.
`instance_count`	integer	The number of instances to use for the deployment. Specify the value based on the workload you expect. This field is only required if you are using the `default` scale type (`scale_settings.type: default`). `instance_count` can be updated after deployment creation using `az ml online-deployment update` command.
`app_insights_enabled`	boolean	Whether to enable integration with the Azure Application Insights instance associated with your workspace.		`false`
`scale_settings`	object	The scale settings for the deployment. The two types of scale settings supported are the `default` scale type and the `target_utilization` scale type. With the `default` scale type (`scale_settings.type: default`), you can manually scale the instance count up and down after deployment creation by updating the `instance_count` property. To configure the `target_utilization` scale type (`scale_settings.type: target_utilization`), see TargetUtilizationScaleSettings for the set of configurable properties.
`scale_settings.type`	string	The scale type.	`default`, `target_utilization`	`target_utilization`
`request_settings`	object	Scoring request settings for the deployment. See RequestSettings for the set of configurable properties.
`liveness_probe`	object	Liveness probe settings for monitoring the health of the container regularly. See ProbeSettings for the set of configurable properties.
`readiness_probe`	object	Readiness probe settings for validating if the container is ready to serve traffic. See ProbeSettings for the set of configurable properties.
`resources`	object	Container resource requirements.
`resources.requests`	object	Resource requests for the container. See ContainerResourceRequests for the set of configurable properties.
`resources.limits`	object	Resource limits for the container. See ContainerResourceLimits for the set of configurable properties.

RequestSettings

Key	Type	Description	Default value
`request_timeout_ms`	integer	The scoring timeout in milliseconds.	`5000`
`max_concurrent_requests_per_instance`	integer	The maximum number of concurrent requests per instance allowed for the deployment. Do not change this setting from the default value unless instructed by Microsoft Technical Support or a member of the Azure ML team.	`1`
`max_queue_wait_ms`	integer	The maximum amount of time in milliseconds a request will stay in the queue.	`500`

ProbeSettings

Key	Type	Description	Default value
`period`	integer	How often (in seconds) to perform the probe.	`10`
`initial_delay`	integer	The number of seconds after the container has started before the probe is initiated. Minimum value is `1`.	`10`
`timeout`	integer	The number of seconds after which the probe times out. Minimum value is `1`.	`2`
`success_threshold`	integer	The minimum consecutive successes for the probe to be considered successful after having failed. Minimum value is `1`.	`1`
`failure_threshold`	integer	When a probe fails, the system will try `failure_threshold` times before giving up. Giving up in the case of a liveness probe means the container will be restarted. In the case of a readiness probe the container will be marked Unready. Minimum value is `1`.	`30`

TargetUtilizationScaleSettings

Key	Type	Description	Default value
`type`	const	The scale type	`target_utilization`
`min_instances`	integer	The minimum number of instances to use.	`1`
`max_instances`	integer	The maximum number of instances to scale to.	`1`
`target_utilization_percentage`	integer	The target CPU usage for the autoscaler.	`70`
`polling_interval`	integer	How often the autoscaler should attempt to scale the deployment, in seconds.	`1`

ContainerResourceRequests

Key	Type	Description
`cpu`	string	The number of CPU cores requested for the container.
`memory`	string	The memory size requested for the container
`nvidia.com/gpu`	string	The number of Nvidia GPU cards requested for the container

ContainerResourceLimits

Key	Type	Description
`cpu`	string	The limit for the number of CPU cores for the container.
`memory`	string	The limit for the memory size for the container.
`nvidia.com/gpu`	string	The limit for the number of Nvidia GPU cards for the container

Remarks

The az ml online-deployment commands can be used for managing Azure Machine Learning Kubernetes online deployments.

Examples

Examples are available in the examples GitHub repository.

Next steps

Install and use the CLI (v2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

reference-yaml-deployment-kubernetes-online.md

reference-yaml-deployment-kubernetes-online.md

CLI (v2) Azure Arc-enabled Kubernetes online deployment YAML schema

YAML syntax

RequestSettings

ProbeSettings

TargetUtilizationScaleSettings

ContainerResourceRequests

ContainerResourceLimits

Remarks

Examples

Next steps

Files

reference-yaml-deployment-kubernetes-online.md

Latest commit

History

reference-yaml-deployment-kubernetes-online.md

File metadata and controls

CLI (v2) Azure Arc-enabled Kubernetes online deployment YAML schema

YAML syntax

RequestSettings

ProbeSettings

TargetUtilizationScaleSettings

ContainerResourceRequests

ContainerResourceLimits

Remarks

Examples

Next steps