Data citizens including both technical and business users rely on data lineage for root cause analysis, impact analysis, data quality tracing, and other data governance applications. In the current data landscape, where data is fluidly moving across locations (on-premises to and across clouds) and across data platforms and applications, it is increasingly important to map the lineage of data. That’s why we’re introducing dynamic lineage extraction currently in preview.
Conventional systems map lineage by parsing data transformation scripts, otherwise called static code analysis. This works well in simple scenarios. For example, when a SQL script is used to produce a target table Customer_Sales by joining two tables called Customer and Sales, static code analysis can map data lineage. However, in many real use cases, the data processing workloads are quite complicated. The scripts could be wrapped in a stored procedure that is parametrized and uses dynamic SQL. There could be a decision tree with an if then else statement executing different scripts at runtime. Or simply, data transactions could have failed to commit at runtime.
In all these examples, dynamic analysis is required to track lineage effectively. Even more importantly, static lineage analysis does not associate data and processes with runtime metadata, limiting customer applications significantly. For instance, dynamic lineage encoding by whom and when a stored procedure was run, and from what application and which server, will enable customers to govern privacy, comply with regulations, increase time-to-insight, and better understand their overall data and processes.
Dynamic data lineage—Azure SQL Databases
Today, we are announcing the preview release of dynamic lineage extraction from Azure SQL Databases in Azure Purview. Azure SQL Database is one of the most widely used relational database systems in enterprises. Stored procedures are commonly used to perform data transformations and aggregations on SQL tables for downstream applications. With this release, the Azure Purview Data Map can be further enriched with dynamic lineage metadata such as run status, impacted number of rows, the client from which the stored procedure is run, user info, and other operational details from actual runs of SQL stored procedures in Azure SQL Databases.
Limited lineage metadata from static code analysis*
The actual implementation involves Azure Purview Data Map tapping into the instrumentation framework of the SQL engine, and extracting runtime logs to aggregate dynamic lineage. The runtime logs also provide actual queries executed in the SQL engine for data manipulation, using Azure Purview can map data lineage and gather additional detailed provenance information. Azure Purview scanners run several times a day to keep up the freshness of dynamic lineage and provenance from Azure SQL Databases.
To learn more about Azure Purview dynamic data lineage from Azure SQL Databases, check out the video:
Get started with Azure Purview today
The native integration with Azure SQL Databases for dynamic lineage and provenance extraction is the first of its kind and Azure Purview is leading the way. Follow the steps below to get started.
- Quickly and easily create an Azure Purview account to try the generally available features.
- Read quick start documentation on how to connect an Azure SQL Database to an Azure Purview account for dynamic data lineage.
Leave a Reply