Главная
Study mode:
on
1
Intro
2
About Zillow
3
Why Monitor Data Quality?
4
Challenges we Faced
5
5 Pillars for Data Quality Platform
6
Platform Architecture
7
Self-Service Onboarding - Goals
8
Self-Service Onboarding . Data Discovery
9
Self-Service Onboarding. Rule-based
10
Self-Service Onboarding Example
11
Self-Service Onboarding - Metrics
12
Self-Service Onboarding . Monitoring
13
Behind the Scenes
14
Validation Libraries
15
Pipeline Integration before
16
Pipeline Integration (after)
17
Validation Results
18
Future Direction
19
Key Takeaways
Description:
Explore a comprehensive 54-minute conference talk from Databricks on building a centralized platform for data quality management. Learn how Zillow tackled the challenge of ensuring data quality across thousands of datasets and pipelines. Discover the five pillars of their data quality platform and its architecture. Gain insights into self-service onboarding processes, including data discovery, rule-based approaches, and monitoring. Understand how validation libraries and pipeline integration work to flag data quality issues early. Examine the platform's capabilities in defining and viewing data quality expectations, performing validations using Spark, and dynamically generating pipelines. See how data quality metrics are exposed alongside datasets to provide a comprehensive health picture over time. Conclude with future directions and key takeaways for implementing a robust data quality management system in complex data organizations.

Democratizing Data Quality Through a Centralized Platform at Zillow

Databricks
Add to list