title | titleSuffix | description | services | ms.service | ms.subservice | ms.topic | ms.reviewer | author | ms.author | ms.date | ms.custom |
---|---|---|---|---|---|---|---|---|---|---|---|
Azure Machine Learning datastores |
Azure Machine Learning |
Learn how to securely connect to your data storage on Azure with Azure Machine Learning datastores. |
machine-learning |
machine-learning |
enterprise-readiness |
conceptual |
nibaccam |
blackmist |
larryfr |
10/21/2021 |
devx-track-python, data4ml |
Supported cloud-based storage services in Azure Machine Learning include:
- Azure Blob Container
- Azure File Share
- Azure Data Lake
- Azure Data Lake Gen2
Azure Machine Learning allows you to connect to data directly by using a storage URI, for example:
https://storageAccount.blob.core.windows.net/container/path/file.csv
(Azure Blob Container)abfss://container@storageAccount.dfs.core.windows.net/base/path/folder1
(Azure Data Lake Gen2).
Storage URIs use identity-based access that will prompt you for your Azure Active Directory token for data access authentication. This approach allows for data access management at the storage level and keeps credentials confidential.
Note
When using Notebooks in Azure Machine Learning Studio, your Azure Active Directory token is automatically passed through to storage for data access authentication.
Although storage URIs provide a convenient mechanism to access data, there may be cases where using an Azure Machine Learning Datastore is a better option:
- You need credential-based data access (for example: Service Principals, SAS Tokens, Account Name/Key). Datastores are helpful because they keep the connection information to your data storage securely in an Azure Keyvault, so you don't have to code it in your scripts.
- You want team members to easily discover relevant datastores. Datastores are registered to an Azure Machine Learning workspace making them easier for your team members to find/discover them.
Register and create a datastore to easily connect to your storage account, and access the data in your underlying storage service.
Azure Machine Learning Datastores support both credential-based and identity-based access. In credential-based access, your authentication credentials are usually kept in a datastore, which is used to ensure you have permission to access the storage service. When these credentials are registered via datastores, any user with the workspace Reader role can retrieve them. That scale of access can be a security concern for some organizations. When you use identity-based data access, Azure Machine Learning prompts you for your Azure Active Directory token for data access authentication instead of keeping your credentials in the datastore. That approach allows for data access management at the storage level and keeps credentials confidential.