Explore a comprehensive 54-minute conference talk from Databricks on building a centralized platform for data quality management. Learn how Zillow tackled the challenge of ensuring data quality across thousands of datasets and pipelines. Discover the five pillars of their data quality platform and its architecture. Gain insights into self-service onboarding processes, including data discovery, rule-based approaches, and monitoring. Understand how validation libraries and pipeline integration work to flag data quality issues early. Examine the platform's capabilities in defining and viewing data quality expectations, performing validations using Spark, and dynamically generating pipelines. See how data quality metrics are exposed alongside datasets to provide a comprehensive health picture over time. Conclude with future directions and key takeaways for implementing a robust data quality management system in complex data organizations.
Democratizing Data Quality Through a Centralized Platform at Zillow