title | description | author | ms.author | ms.service | ms.subservice | ms.topic | ms.date |
---|---|---|---|---|---|---|---|
Previous monthly updates in Azure Synapse Analytics |
Archive of the new features and documentation improvements for Azure Synapse Analytics |
ryanmajidi |
rymajidi |
synapse-analytics |
overview |
conceptual |
05/20/2022 |
This article describes previous month updates to Azure Synapse Analytics. For the most current month's release, check out Azure Synapse Analytics latest updates. Each update links to the Azure Synapse Analytics blog and an article that provides more information.
The following updates are new to Azure Synapse Analytics this month.
-
Code cells in Synapse notebooks that result in exception will now show standard output along with the exception message. This feature is supported for Python and Scala languages. To learn more, see the example output when a code statement fails.
-
Synapse notebooks now support partial output when running code cells. To learn more, see the examples at this blog post
-
You can now dynamically control Spark session configuration for the notebook activity with pipeline parameters. To learn more, see the variable explorer feature of Synapse notebooks.
-
You can now reuse and manage notebook sessions without having to start a new one. You can easily connect a selected notebook to an active session in the list started from another notebook. You can detach a session from a notebook, stop the session, and monitor it. To learn more, see how to manage your active notebook sessions.
-
Synapse notebooks now capture anything written through the Python logging module, in addition to the driver logs. To learn more, see support for Python logging.
-
Column Level Encryption for Azure Synapse dedicated SQL Pools is now Generally Available. With column level encryption, you can use different protection keys for each column with each key having its own access permissions. The data in CLE-enforced columns are encrypted on disk and remain encrypted in memory until the DECRYPTBYKEY function is used to decrypt it. To learn more, see how to encrypt a data column.
-
Serverless SQL pools now support better performance for CETAS (Create External Table as Select) and subsequent SELECT queries. The performance improvements include, a parallel execution plan resulting in faster CETAS execution and outputting multiple files. To learn more, see CETAS with Synapse SQL article and the blog post
-
Synapse Spark Common Data Model (CDM) Connector is now Generally Available. The CDM format reader/writer enables a Spark program to read and write CDM entities in a CDM folder via Spark dataframes. To learn more, see how the CDM connector supports reading, writing data, examples, & known issues.
-
Synapse Spark Dedicated SQL Pool (DW) Connector now supports improved performance. The new architecture eliminates redundant data movement and uses COPY-INTO instead of PolyBase. You can authenticate through SQL basic authentication or opt into the Azure Active Directory/Azure AD based authentication method. It now has ~5x improvements over the previous version. To learn more, see Azure Synapse Dedicated SQL Pool Connector for Apache Spark
-
Synapse Spark Dedicated SQL Pool (DW) Connector now supports all Spark Dataframe SaveMode choices. It supports Append, Overwrite, ErrorIfExists, and Ignore modes. The Append and Overwrite are critical for managing data ingestion at scale. To learn more, see DataFrame write SaveMode support
-
Accelerate Spark execution speed using the new Intelligent Cache feature. This feature is currently in public preview. Intelligent Cache automatically stores each read within the allocated cache storage space, detecting underlying file changes and refreshing the files to provide the most recent data. To learn more, see how to Enable/Disable the cache for your Apache Spark pool or see the blog post
-
Azure Synapse Analytics now supports Azure Active Directory (Azure AD) authentication. You can turn on Azure AD authentication during the workspace creation or after the workspace is created. To learn more, see how to use Azure AD authentication with Synapse SQL.
-
API support to raise or lower minimal TLS version for workspace managed SQL Server Dedicated SQL. To learn more, see how to update the minimum TLS setting or read the blog post for more details.
-
Flowlets and CDC Connectors are now Generally Available. Flowlets in Synapse Data Flows allow for reusable and composable ETL logic. To learn more, see Flowlets in mapping data flow or see the blog post.
-
sFTP connector for Synapse data flows. You can read and write data while transforming data from sftp using the visual low-code data flows interface in Synapse. To learn more, see source transformation
-
Data flow improvements to Data Preview. To learn more, see Data Preview and debug improvements in Mapping Data Flows
-
Pipeline script activity. The Script Activity enables data engineers to build powerful data integration pipelines that can read from and write to Synapse databases, and other database types. To learn more, see Transform data by using the Script activity in Azure Data Factory or Synapse Analytics
The following updates are new to Azure Synapse Analytics this month.
-
Serverless SQL Pools now support more consistent query execution times. Learn how Serverless SQL pools automatically detect spikes in read latency and support consistent query execution time.
-
The
OPENJSON
function makes it easy to get array element indexes. To learn more, see how the OPENJSON function in a serverless SQL pool allows you to parse nested arrays and return one row for each JSON array element with the index of each element.
-
Upserting data is now supported by the copy activity. See how you can natively load data into a temporary table and then merge that data into a sink table with upsert.
-
Transform Dynamics Data Visually in Synapse Data Flows. Learn more on how to use a Dynamics dataset or an inline dataset as source and sink types to transform data at scale.
-
Connect to your SQL sources in data flows using Always Encrypted. To learn more, see how to securely connect to your SQL databases from Synapse data flows using Always Encrypted.
-
Capture descriptions from asserts in Data Flows To learn more, see how to define your own dynamic descriptive messages in the assert data flow transformation at the row or column level.
-
Easily define schemas for complex type fields. To learn more, see how you can make the engine to automatically detect the schema of an embedded complex field inside a string column.
The following updates are new to Azure Synapse Analytics this month.
You can now use four new database templates in Azure Synapse. Learn more about Automotive, Genomics, Manufacturing, and Pharmaceuticals templates from the blog post or the database templates article. These templates are currently in public preview and are available within the Synapse Studio gallery.
Improvements to the Synapse Machine Learning library v0.9.5 (previously called MMLSpark). This release simplifies the creation of massively scalable machine learning pipelines with Apache Spark. To learn more, read the blog post about the new capabilities in this release or see the full release notes
-
The Azure Synapse Analytics security overview - A whitepaper that covers the five layers of security. The security layers include authentication, access control, data protection, network security, and threat protection. Understand each security feature in detailed to implement an industry-standard security baseline and protect your data on the cloud.
-
TLS 1.2 is now required for newly created Synapse Workspaces. To learn more, see how TLS 1.2 provides enhanced security using this article or the blog post. Login attempts to a newly created Synapse workspace from connections using TLS versions lower than 1.2 will fail.
-
Data quality validation rules using Assert transformation - You can now easily add data quality, data validation, and schema validation to your Synapse ETL jobs by using Assert transformation in Synapse data flows. To learn more, see the Assert transformation in mapping data flow article or the blog post.
-
Native data flow connector for Dynamics - Synapse data flows can now read and write data directly to Dynamics through the new data flow Dynamics connector. Learn more on how to Create data sets in data flows to read, transform, aggregate, join, etc. using this article or the blog post. You can then write the data back into Dynamics using the built-in Synapse Spark compute.
-
IntelliSense and auto-complete added to pipeline expressions - IntelliSense makes creating expressions, editing them easy. To learn more, see how to check your expression syntax, find functions, and add code to your pipelines.
-
COPY schema discovery for complex data ingestion. To learn more, see the blog post or how GitHub leveraged this functionality in Introducing Automatic Schema Discovery with auto table creation for complex datatypes.
-
Serverless SQL pools now support the HASHBYTES function. HASHBYTES is a T-SQL function, which hashes values. Learn how to use hash values in distributing data using this article or the blog post.
The following updates are new to Azure Synapse Analytics this month.
- Accelerate Spark workloads with NVIDIA GPU acceleration blog article
- Mount remote storage to a Synapse Spark pool blog article
- Natively read & write data in ADLS with Pandas blog article
- Dynamic allocation of executors for Spark blog article
- The Synapse Machine Learning library blog article
- Getting started with state-of-the-art pre-built intelligent models blog article
- Building responsible AI systems with the Synapse ML library blog article
- PREDICT is now GA for Synapse Dedicated SQL pools blog article
- Simple & scalable scoring with PREDICT and MLFlow for Apache Spark for Synapse blog article
- Retail AI solutions blog article
- User-Assigned managed identities now supported in Synapse Pipelines in preview blog article
- Browse ADLS Gen2 folders in an Azure Synapse Analytics workspace in preview blog article
- Pipeline Fail activity blog article
- Mapping Data Flow gets new native connectors blog article
- More notebook export formats: HTML, Python, and LaTeX blog
- Three new chart types in notebook view: box plot, histogram, and pivot table blog
- Reconnect to lost notebook session blog
- Synapse Link for Dataverse blog article
- Custom partitions for Synapse link for Azure Cosmos DB in preview blog article
- Map data tool (Public Preview), a no-code guided ETL experience blog article
- Quick reuse of spark cluster blog article
- External Call transformation blog article
- Flowlets (Public Preview) blog article
The following updates are new to Azure Synapse Analytics this month.
- Introducing Lake databases (formerly known as Spark databases) blog article
- Lake database designer now available in preview blog article
- Database Templates and Database Designer blog article
- Delta Lake support for serverless SQL is generally available blog article
- Query multiple file paths using OPENROWSET in serverless SQL blog article
- Serverless SQL queries can now return up to 200 GB of results blog article
- Handling invalid rows with OPENROWSET in serverless SQL blog article
- Accelerate Spark workloads with NVIDIA GPU acceleration blog article
- Mount remote storage to a Synapse Spark pool blog article
- Natively read & write data in ADLS with Pandas blog article
- Dynamic allocation of executors for Spark blog article
- The Synapse Machine Learning library blog article
- Getting started with state-of-the-art pre-built intelligent models blog article
- Building responsible AI systems with the Synapse ML library blog article
- PREDICT is now GA for Synapse Dedicated SQL pools blog article
- Simple & scalable scoring with PREDICT and MLFlow for Apache Spark for Synapse blog article
- Retail AI solutions blog article
- User-Assigned managed identities now supported in Synapse Pipelines in preview blog article
- Browse ADLS Gen2 folders in an Azure Synapse Analytics workspace in preview blog article
- Synapse Link for Dataverse blog article
- Custom partitions for Synapse link for Azure Cosmos DB in preview blog article
The following updates are new to Azure Synapse Analytics this month.
- Manage your cost with Azure Synapse pre-purchase plans blog article
- Move your Azure Synapse workspace across Azure regions blog article
- Spark performance optimizations blog
- All Synapse RBAC roles are now generally available for use in production blog article
- Apply User-Assigned Managed Identities for Double Encryption blog article
- Synapse Administrators now have elevated access to dedicated SQL pools blog article
- Use Stringify in data flows to easily transform complex data types to strings blog article
- Control Spark session time-to-live (TTL) in data flows blog article
- Deploy Synapse workspaces using GitHub Actions blog article
- More control creating Git branches in Synapse Studio blog article