title | titleSuffix | description | services | ms.service | ms.subservice | ms.topic | ms.author | author | ms.reviewer | ms.date | ms.custom |
---|---|---|---|---|---|---|---|---|---|---|---|
Create a linked service with Synapse and Azure Machine Learning workspaces (preview) |
Azure Machine Learning |
Learn how to link Azure Synapse and Azure Machine Learning workspaces, and attach Apache Spark pools for a unified data wrangling experience. |
machine-learning |
machine-learning |
mldata |
how-to |
larryfr |
blackmist |
nibaccam |
10/21/2021 |
devx-track-python, data4ml, synapse-azureml, contperf-fy21q4, sdkv1, event-tier1-build-2022 |
Link Azure Synapse Analytics and Azure Machine Learning workspaces and attach Apache Spark pools(preview)
[!INCLUDE sdk v1]
In this article, you learn how to create a linked service that links your Azure Synapse Analytics workspace and Azure Machine Learning workspace.
With your Azure Machine Learning workspace linked with your Azure Synapse workspace, you can attach an Apache Spark pool, powered by Azure Synapse Analytics, as a dedicated compute for data wrangling at scale or conduct model training all from the same Python notebook.
You can link your ML workspace and Synapse workspace via the Python SDK or the Azure Machine Learning studio.
You can also link workspaces and attach a Synapse Spark pool with a single Azure Resource Manager (ARM) template.
Important
The Azure Machine Learning and Azure Synapse integration is in public preview. The functionalities presented from the azureml-synapse
package are experimental preview features, and may change at any time.
-
Create Apache Spark pool using Azure portal, web tools, or Synapse Studio
-
Install the Azure Machine Learning Python SDK
-
Access to the Azure Machine Learning studio.
Important
To link to the Synapse workspace successfully, you must be granted the Owner role of the Synapse workspace. Check your access in the Azure portal.
If you are not an Owner and are only a Contributor to the Synapse workspace, you can only use existing linked services. See how to Retrieve and use an existing linked service.
The following code employs the LinkedService
and SynapseWorkspaceLinkedServiceConfiguration
classes to,
- Link your machine learning workspace,
ws
with your Azure Synapse workspace. - Register your Synapse workspace with Azure Machine Learning as a linked service.
import datetime
from azureml.core import Workspace, LinkedService, SynapseWorkspaceLinkedServiceConfiguration
# Azure Machine Learning workspace
ws = Workspace.from_config()
#link configuration
synapse_link_config = SynapseWorkspaceLinkedServiceConfiguration(
subscription_id=ws.subscription_id,
resource_group= 'your resource group',
name='mySynapseWorkspaceName')
# Link workspaces and register Synapse workspace in Azure Machine Learning
linked_service = LinkedService.register(workspace = ws,
name = 'synapselink1',
linked_service_config = synapse_link_config)
Important
A managed identity, system_assigned_identity_principal_id
, is created for each linked service. This managed identity must be granted the Synapse Apache Spark Administrator role of the Synapse workspace before you start your Synapse session. Assign the Synapse Apache Spark Administrator role to the managed identity in the Synapse Studio.
To find the system_assigned_identity_principal_id
of a specific linked service, use LinkedService.get('<your-mlworkspace-name>', '<linked-service-name>')
.
View all the linked services associated with your machine learning workspace.
LinkedService.list(ws)
To unlink your workspaces, use the unregister()
method
linked_service.unregister()
Link your machine learning workspace and Synapse workspace via the Azure Machine Learning studio with the following steps:
-
Sign in to the Azure Machine Learning studio.
-
Select Linked Services in the Manage section of the left pane.
-
Select Add integration.
-
On the Link workspace form, populate the fields
Field Description Name Provide a name for your linked service. This name is what will be used to reference to this particular linked service. Subscription name Select the name of your subscription that's associated with your machine learning workspace. Synapse workspace Select the Synapse workspace you want to link to. -
Select Next to open the Select Spark pools (optional) form. On this form, you select which Synapse Spark pool to attach to your workspace
-
Select Next to open the Review form and check your selections.
-
Select Create to complete the linked service creation process.
Before you can attach a dedicated compute for data wrangling, you must have an ML workspace that's linked to an Azure Synapse Analytics workspace, this is referred to as a linked service.
To retrieve and use an existing linked service, requires User or Contributor permissions to the Azure Synapse Analytics workspace.
This example retrieves an existing linked service, synapselink1
, from the workspace, ws
, with the get()
method.
from azureml.core import LinkedService
linked_service = LinkedService.get(ws, 'synapselink1')
Once you retrieve the linked service, attach a Synapse Apache Spark pool as a dedicated compute resource for your data wrangling tasks.
You can attach Apache Spark pools via,
- Azure Machine Learning studio
- Azure Resource Manager (ARM) templates
- The Azure Machine Learning Python SDK
Follow these steps:
- Sign in to the Azure Machine Learning studio.
- Select Linked Services in the Manage section of the left pane.
- Select your Synapse workspace.
- Select Attached Spark pools on the top left.
- Select Attach.
- Select your Apache Spark pool from the list and provide a name.
- This list identifies the available Synapse Spark pools that can be attached to your compute.
- To create a new Synapse Spark pool, see Create Apache Spark pool with the Synapse Studio
- Select Attach selected.
You can also employ the Python SDK to attach an Apache Spark pool.
The follow code,
-
Configures the
SynapseCompute
with,- The
LinkedService
,linked_service
that you either created or retrieved in the previous step. - The type of compute target you want to attach,
SynapseSpark
- The name of the Apache Spark pool. This must match an existing Apache Spark pool that is in your Azure Synapse Analytics workspace.
- The
-
Creates a machine learning
ComputeTarget
by passing in,- The machine learning workspace you want to use,
ws
- The name you'd like to refer to the compute within the Azure Machine Learning workspace.
- The attach_configuration you specified when configuring your Synapse Compute.
- The call to ComputeTarget.attach() is asynchronous, so the sample blocks until the call completes.
- The machine learning workspace you want to use,
from azureml.core.compute import SynapseCompute, ComputeTarget
attach_config = SynapseCompute.attach_configuration(linked_service, #Linked synapse workspace alias
type='SynapseSpark', #Type of assets to attach
pool_name=synapse_spark_pool_name) #Name of Synapse spark pool
synapse_compute = ComputeTarget.attach(workspace= ws,
name= synapse_compute_name,
attach_configuration= attach_config
)
synapse_compute.wait_for_completion()
Verify the Apache Spark pool is attached.
ws.compute_targets['Synapse Spark pool alias']