Play all

Intro

What is Zillow Offers?

Original Architecture

New Architecture

Establish Processing Layers

Pipeler Library

Config-driven Orchestration

Data Processing vs. Business Logic

Validating Data Early

Description:

Explore how Zillow's data engineering team revolutionized their data pipeline architecture using Apache Spark in this 27-minute conference talk. Learn about the challenges of balancing development speed with pipeline maintainability in a rapidly evolving organization. Discover how Zillow identified and addressed technical debt, improved data quality enforcement, consolidated shared pipeline functionality, and implemented scalable complex business logic. Gain insights into the process of designing a new end-to-end pipeline architecture that enhances robustness, maintainability, and scalability while reducing code complexity. Understand the pain points in pipeline development, maintenance, and scaling, and explore the pros and cons of various ETL patterns. Delve into Zillow's approach to creating more scalable and robust data pipelines using Apache Spark, including the establishment of processing layers, the development of a Pipeler Library, config-driven orchestration, separation of data processing and business logic, and early data validation techniques. Read more

Designing the Next Generation of Data Pipelines with Apache Spark - Zillow's Approach

Databricks

Add to list

#Data Science #Data Engineering #Big Data #Apache Spark #Computer Science #Software Engineering #Scalability #Information Technology #Data Management #Data Validation #ETL