Главная
Study mode:
on
1
Intro
2
Problem Statement
3
Dirty Data
4
Build or Buy
5
Design Decisions
6
Microsoft Enterprise Data Warehouse
7
Demo
8
Summary
Description:
Explore a fast-paced 27-minute video presentation by Databricks Technical Leads and Champions Darren Fuller and Sandy May on productionizing Data Quality Pipelines for enterprise customers. Learn about their vision to empower business decisions on data remediation actions and self-healing of Data Pipelines through a library of Data Quality rule templates, reporting Data Model, and PowerBI reports. Discover how the Lakehouse pattern emphasizes Data Quality at the Lake layer, utilizing tools like Delta Lake for schema protection and column checking. Watch quick-fire demos showcasing how Apache Spark can be leveraged for applying rules over data at Staging or Curation points. Gain insights into simple and complex rule applications, including net sales calculations, value validations, statistical distribution validations, and complex pattern matching. Get a glimpse of future work in Data Compliance for PII data, involving rule generation using regex patterns and Machine Learning-based transfer learning. Read more

Building Data Quality Pipelines with Apache Spark and Delta Lake

Databricks
Add to list
0:00 / 0:00