title | titleSuffix | description | author | ms.author | ms.service | ms.subservice | ms.topic | ms.custom | ms.date |
---|---|---|---|---|---|---|---|---|---|
Prepare data for computer vision tasks |
Azure Machine Learning |
Image data preparation for Azure Machine Learning automated ML to train computer vision models on classification, object detection, and segmentation |
vadthyavath |
rvadthyavath |
machine-learning |
automl |
how-to |
template-how-to, sdkv2, event-tier1-build-2022 |
05/26/2022 |
[!INCLUDE sdk v2]
[!div class="op_single_selector" title1="Select the version of Azure Machine Learning CLI extension you are using:"]
Important
Support for training computer vision models with automated ML in Azure Machine Learning is an experimental public preview feature. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
In this article, you learn how to prepare image data for training computer vision models with automated machine learning in Azure Machine Learning.
To generate models for computer vision tasks with automated machine learning, you need to bring labeled image data as input for model training in the form of an MLTable
.
You can create an MLTable
from labeled training data in JSONL format.
If your labeled training data is in a different format (like, pascal VOC or COCO), you can use a conversion script to first convert it to JSONL, and then create an MLTable
. Alternatively, you can use Azure Machine Learning's data labeling tool to manually label images, and export the labeled data to use for training your AutoML model.
- Familiarize yourself with the accepted schemas for JSONL files for AutoML computer vision experiments.
In order to train computer vision models using AutoML, you need to first get labeled training data. The images need to be uploaded to the cloud and label annotations need to be in JSONL format. You can either use the Azure ML Data Labeling tool to label your data or you could start with pre-labeled image data.
If you don't have pre-labeled data, you can use Azure Machine Learning's data labeling tool to manually label images. This tool automatically generates the data required for training in the accepted format.
It helps to create, manage, and monitor data labeling tasks for
- Image classification (multi-class and multi-label)
- Object detection (bounding box)
- Instance segmentation (polygon)
If you already have a data labeling project and you want to use that data, you can export your labeled data as an Azure ML Dataset. You can then access the exported dataset under the 'Datasets' tab in Azure ML Studio, and download the underlying JSONL file from the Dataset details page under Data sources. The downloaded JSONL file can then be used to create an MLTable
that can be used by automated ML for training computer vision models.
If you have previously labeled data that you would like to use to train your model, you will first need to upload the images to the default Azure Blob Storage of your Azure ML Workspace and register it as a data asset.
[!INCLUDE cli v2]
Create a .yml file with the following configuration.
$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: fridge-items-images-object-detection
description: Fridge-items images Object detection
path: ./data/odFridgeObjects
type: uri_folder
To upload the images as a data asset, you run the following CLI v2 command with the path to your .yml file, workspace name, resource group and subscription ID.
az ml data create -f [PATH_TO_YML_FILE] --workspace-name [YOUR_AZURE_WORKSPACE] --resource-group [YOUR_AZURE_RESOURCE_GROUP] --subscription [YOUR_AZURE_SUBSCRIPTION]
[!Notebook-python[] (~/azureml-examples-main/sdk/jobs/automl-standalone-jobs/automl-image-object-detection-task-fridge-items/automl-image-object-detection-task-fridge-items.ipynb?name=upload-data)]
Next, you will need to get the label annotations in JSONL format. The schema of labeled data depends on the computer vision task at hand. Refer to schemas for JSONL files for AutoML computer vision experiments to learn more about the required JSONL schema for each task type.
If your training data is in a different format (like, pascal VOC or COCO), helper scripts to convert the data to JSONL are available in notebook examples.
Once you have your labeled data in JSONL format, you can use it to create MLTable
as shown below. MLtable packages your data into a consumable object for training.
:::code language="yaml" source="~/azureml-examples-main/sdk/jobs/automl-standalone-jobs/automl-image-object-detection-task-fridge-items/data/training-mltable-folder/MLTable":::
You can then pass in the MLTable
as a data input for your AutoML training job.