Skip to content

Files

162 lines (107 loc) · 9.78 KB

how-to-train-with-rest.md

File metadata and controls

162 lines (107 loc) · 9.78 KB
title titleSuffix description services ms.service ms.subservice ms.topic author ms.author ms.date ms.reviewer ms.custom
Train models with REST (preview)
Azure Machine Learning
Learn how to train models and create jobs with REST APIs.
machine-learning
machine-learning
core
how-to
singankit
anksing
03/31/2022
nibaccam
devplatv2, event-tier1-build-2022

Train models with REST (preview)

Learn how to use the Azure Machine Learning REST API to create and manage training jobs (preview).

The REST API uses standard HTTP verbs to create, retrieve, update, and delete resources. The REST API works with any language or tool that can make HTTP requests. REST's straightforward structure makes it a good choice in scripting environments and for MLOps automation.

In this article, you learn how to use the new REST APIs to:

[!div class="checklist"]

  • Create machine learning assets
  • Create a basic training job
  • Create a hyperparameter tuning sweep job

Prerequisites

Azure Machine Learning jobs

A job is a resource that specifies all aspects of a computation job. It aggregates three things:

  • What to run?
  • How to run it?
  • Where to run it?

There are many ways to submit an Azure Machine Learning job including the SDK, Azure CLI, and visually with the studio. The following example submits a LightGBM training job with the REST API.

Create machine learning assets

First, set up your Azure Machine Learning assets to configure your job.

In the following REST API calls, we use $SUBSCRIPTION_ID, $RESOURCE_GROUP, $LOCATION, and $WORKSPACE as placeholders. Replace the placeholders with your own values.

Administrative REST requests a service principal authentication token. Replace $TOKEN with your own value. You can retrieve this token with the following command:

TOKEN=$(az account get-access-token --query accessToken -o tsv)

The service provider uses the api-version argument to ensure compatibility. The api-version argument varies from service to service. The current Azure Machine Learning API version is 2022-02-01-preview. Set the API version as a variable to accommodate future versions:

API_VERSION="2022-02-01-preview"

Compute

Running machine learning jobs requires compute resources. You can list your workspace's compute resources:

curl "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.MachineLearningServices/workspaces/$WORKSPACE/computes?api-version=$API_VERSION" \
--header "Authorization: Bearer $TOKEN"

For this example, we use an existing compute cluster named cpu-cluster. We set the compute name as a variable for encapsulation:

COMPUTE_NAME="cpu-cluster"

Environment

The LightGBM example needs to run in a LightGBM environment. Create the environment with a PUT request. Use a docker image from Microsoft Container Registry.

You can configure the docker image with Docker and add conda dependencies with condaFile:

:::code language="rest-api" source="~/azureml-examples-main/cli/train-rest.sh" id="create_environment":::

Datastore

The training job needs to run on data, so you need to specify a datastore. In this example, you get the default datastore and Azure Storage account for your workspace. Query your workspace with a GET request to return a JSON file with the information.

You can use the tool jq to parse the JSON result and get the required values. You can also use the Azure portal to find the same information.

response=$(curl --location --request GET "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.MachineLearningServices/workspaces/$WORKSPACE/datastores?api-version=$API_VERSION&isDefault=true" \
--header "Authorization: Bearer $TOKEN")

AZURE_STORAGE_ACCOUNT=$(echo $response | jq '.value[0].properties.contents.accountName')
AZUREML_DEFAULT_DATASTORE=$(echo $response | jq '.value[0].name')
AZUREML_DEFAULT_CONTAINER=$(echo $response | jq '.value[0].properties.contents.containerName')
AZURE_STORAGE_KEY=$(az storage account keys list --account-name $AZURE_STORAGE_ACCOUNT | jq '.[0].value')

Data

Now that you have the datastore, you can create a dataset. For this example, use the common dataset iris.csv.

:::code language="rest-api" source="~/azureml-examples-main/cli/train-rest.sh" id="create_data":::

Code

Now that you have the dataset and datastore, you can upload the training script that will run on the job. Use the Azure Storage CLI to upload a blob into your default container. You can also use other methods to upload, such as the Azure portal or Azure Storage Explorer.

az storage blob upload-batch -d $AZUREML_DEFAULT_CONTAINER/src \
 -s jobs/train/lightgbm/iris/src --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY

Once you upload your code, you can specify your code with a PUT request and reference the url through codeUri.

:::code language="rest-api" source="~/azureml-examples-main/cli/train-rest.sh" id="create_code":::

Submit a training job

Now that your assets are in place, you can run the LightGBM job, which outputs a trained model and metadata. You need the following information to configure the training job:

  • run_id: [Optional] The name of the job, which must be unique across all jobs. Unless a name is specified either in the YAML file via the name field or the command line via --name/-n, a GUID/UUID is automatically generated and used for the name.
  • jobType: The job type. For a basic training job, use Command.
  • codeId: The ARMId reference of the name and version of your training script.
  • command: The command to execute. Input data can be written into the command and can be referred to with data binding.
  • environmentId: The ARMId reference of the name and version of your environment.
  • inputDataBindings: Data binding can help you reference input data. Create an environment variable and the name of the binding will be added to AZURE_ML_INPUT_, which you can refer to in command. You can directly reference a public blob url file as a UriFile through the uri parameter.
  • experimentName: [Optional] Tags the job to help you organize jobs in Azure Machine Learning studio. Each job's run record is organized under the corresponding experiment in the studio "Experiment" tab. If omitted, tags default to the name of the working directory when the job is created.
  • computeId: The computeId specifies the compute target name through an ARMId.

Use the following commands to submit the training job:

:::code language="rest-api" source="~/azureml-examples-main/cli/train-rest.sh" id="create_job":::

Submit a hyperparameter sweep job

Azure Machine Learning also lets you efficiently tune training hyperparameters. You can create a hyperparameter tuning suite, with the REST APIs. For more information on Azure Machine Learning's hyperparameter tuning options, see Hyperparameter tuning a model. Specify the hyperparameter tuning parameters to configure the sweep:

  • jobType: The job type. For a sweep job, it will be Sweep.
  • algorithm: The sampling algorithm class - class "random" is often a good place to start. See the sweep job schema for the enumeration of options.
  • trial: The command job configuration for each trial to be run.
  • objective: The primaryMetric is the optimization metric, which must match the name of a metric logged from the training code. The goal specifies the direction (minimize or maximize). See the schema for the full enumeration of options.
  • searchSpace: A generic object of hyperparameters to sweep over. The key is a name for the hyperparameter, for example, learning_rate. The value is the hyperparameter distribution. See the schema for the enumeration of options.
  • Limits: JobLimitsType of type sweep is an object definition of the sweep job limits parameters. maxTotalTrials [Optional] is the maximum number of individual trials to run. maxConcurrentTrials is the maximum number of trials to run concurrently on your compute cluster.

To create a sweep job with the same LightGBM example, use the following commands:

:::code language="rest-api" source="~/azureml-examples-main/cli/train-rest.sh" id="create_a_sweep_job":::

Next steps

Now that you have a trained model, learn how to deploy your model.