Skip to content

Files

Latest commit

abfa5de · May 24, 2022

History

History
430 lines (325 loc) · 11.2 KB

how-to-datastore.md

File metadata and controls

430 lines (325 loc) · 11.2 KB
title titleSuffix description services ms.service ms.subservice ms.topic ms.author author ms.reviewer ms.date ms.custom
Use datastores
Azure Machine Learning
Learn how to use datastores to connect to Azure storage services during training with Azure Machine Learning.
machine-learning
machine-learning
mldata
how-to
yogipandey
ynpandey
nibaccam
01/28/2022
contperf-fy21q1, devx-track-python, data4ml

Connect to storage with Azure Machine Learning datastores

In this article, learn how to connect to data storage services on Azure with Azure Machine Learning datastores.

Prerequisites

Note

Azure Machine Learning datastores do not create the underlying storage accounts, rather they register an existing storage account for use in Azure Machine Learning. It is not a requirement to use Azure Machine Learning datastores - you can use storage URIs directly assuming you have access to the underlying data.

Create an Azure Blob datastore

Create the following YAML file (updating the values):

# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: my_blob_ds # add name of your datastore here
type: azure_blob
description: here is a description # add a description of your datastore here
account_name: my_account_name # add storage account name here
container_name: my_container_name # add storage container name here

Create the Azure Machine Learning datastore in the CLI:

az ml datastore create --file my_blob_datastore.yml

Create the following YAML file (updating the values):

# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: blob_example
type: azure_blob
description: Datastore pointing to a blob container.
account_name: mytestblobstore
container_name: data-container
credentials:
  account_key: XXXxxxXXXxXXXXxxXXXXXxXXXXXxXxxXxXXXxXXXxXXxxxXXxxXXXxXxXXXxxXxxXXXXxxxxxXXxxxxxxXXXxXXX

Create the Azure Machine Learning datastore in the CLI:

az ml datastore create --file my_blob_datastore.yml

Create the following YAML file (updating the values):

# my_blob_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: blob_sas_example
type: azure_blob
description: Datastore pointing to a blob container using SAS token.
account_name: mytestblobstore
container_name: data-container
credentials:
  sas_token: ?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX

Create the Azure Machine Learning datastore in the CLI:

az ml datastore create --file my_blob_datastore.yml
from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureBlobDatastore(
    name="",
    description="",
    account_name="",
    container_name=""
)

ml_client.create_or_update(store)
from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml.entities._datastore.credentials import AccountKeyCredentials
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

creds = AccountKeyCredentials(account_key="")

store = AzureBlobDatastore(
    name="",
    description="",
    account_name="",
    container_name="",
    credentials=creds
)

ml_client.create_or_update(store)
from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml.entities._datastore.credentials import SasTokenCredentials
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

creds = SasTokenCredentials(sas_token="")

store = AzureBlobDatastore(
    name="",
    description="",
    account_name="",
    container_name="",
    credentials=creds
)

ml_client.create_or_update(store)

Create an Azure Data Lake Gen2 datastore

Create the following YAML file (updating the values):

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
name: adls_gen2_credless_example
type: azure_data_lake_gen2
description: Credential-less datastore pointing to an Azure Data Lake Storage Gen2.
account_name: mytestdatalakegen2
filesystem: my-gen2-container

Create the Azure Machine Learning datastore in the CLI:

az ml datastore create --file my_adls_datastore.yml

Create the following YAML file (updating the values):

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen2.schema.json
name: adls_gen2_example
type: azure_data_lake_gen2
description: Datastore pointing to an Azure Data Lake Storage Gen2.
account_name: mytestdatalakegen2
filesystem: my-gen2-container
credentials:
  tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Create the Azure Machine Learning datastore in the CLI:

az ml datastore create --file my_adls_datastore.yml
from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureDataLakeGen2Datastore(
    name="",
    description="",
    account_name="",
    file_system=""
)

ml_client.create_or_update(store)
from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

creds = ServicePrincipalCredentials(
    authority_url="",
    resource_url=""
    tenant_id="",
    secrets=""
)

store = AzureDataLakeGen2Datastore(
    name="",
    description="",
    account_name="",
    file_system="",
    credentials=creds
)

ml_client.create_or_update(store)

Create an Azure Files datastore

Create the following YAML file (updating the values):

# my_files_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
name: file_example
type: azure_file
description: Datastore pointing to an Azure File Share.
account_name: mytestfilestore
file_share_name: my-share
credentials:
  account_key: XxXxXxXXXXXXXxXxXxxXxxXXXXXXXXxXxxXXxXXXXXXXxxxXxXXxXXXXXxXXxXXXxXxXxxxXXxXXxXXXXXxXxxXX

Create the Azure Machine Learning datastore in the CLI:

az ml datastore create --file my_files_datastore.yml

Create the following YAML file (updating the values):

# my_files_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureFile.schema.json
name: file_sas_example
type: azure_file
description: Datastore pointing to an Azure File Share using SAS token.
account_name: mytestfilestore
file_share_name: my-share
credentials:
  sas_token: ?xx=XXXX-XX-XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXXxXXxXX

Create the Azure Machine Learning datastore in the CLI:

az ml datastore create --file my_files_datastore.yml
from azure.ai.ml.entities import AzureFileDatastore
from azure.ai.ml.entities._datastore.credentials import AccountKeyCredentials
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

creds = AccountKeyCredentials(account_key="")

store = AzureFileDatastore(
    name="", 
    description="", 
    account_name="", 
    file_share_name="", 
    credentials=creds
)

ml_client.create_or_update(store)
from azure.ai.ml.entities import AzureFileDatastore
from azure.ai.ml.entities._datastore.credentials import SasTokenCredentials
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

creds = SasTokenCredentials(sas_token="")

store = AzureFileDatastore(
    name="", 
    description="", 
    account_name="", 
    file_share_name="", 
    credentials=creds
)

ml_client.create_or_update(store)

Create an Azure Data Lake Gen1 datastore

Create the following YAML file (updating the values):

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen1.schema.json
name: alds_gen1_credless_example
type: azure_data_lake_gen1
description: Credential-less datastore pointing to an Azure Data Lake Storage Gen1.
store_name: mytestdatalakegen1

Create the Azure Machine Learning datastore in the CLI:

az ml datastore create --file my_adls_datastore.yml

Create the following YAML file (updating the values):

# my_adls_datastore.yml
$schema: https://azuremlschemas.azureedge.net/latest/azureDataLakeGen1.schema.json
name: adls_gen1_example
type: azure_data_lake_gen1
description: Datastore pointing to an Azure Data Lake Storage Gen1.
store_name: mytestdatalakegen1 
credentials:
  tenant_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
  client_secret: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Create the Azure Machine Learning datastore in the CLI:

az ml datastore create --file my_adls_datastore.yml
from azure.ai.ml.entities import AzureDataLakeGen1Datastore
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

store = AzureDataLakeGen1Datastore(
    name="",
    store_name="",
    description="",
)

ml_client.create_or_update(store)
from azure.ai.ml.entities import AzureDataLakeGen1Datastore
from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials
from azure.ai.ml import MLClient

ml_client = MLClient.from_config()

creds = ServicePrincipalCredentials(
    authority_url="",
    resource_url=""
    tenant_id="",
    secrets=""
)

store = AzureDataLakeGen1Datastore(
    name="",
    store_name="",
    description="",
    credentials=creds
)


ml_client.create_or_update(store)

Next steps