Play all

Introduction

Agenda

About GetYourGuide

Introduction to GetYourGuide

Legacy Pipelines

Introducing Riverless

Extraction Layer

DB Zoom

Schema Service

Converter

Abject

Data Landscape

Dependency Management

Transformation Layer Components

Special Syntax Elements

Importance of Testing

Dependencies

Benefits

Next Steps

Questions

How it works

ReadWrite

Small File Problem

Data Warehouse

Question

Description:

Explore the development of a modern ETL pipeline using Debezium, Kafka, Spark, and Airflow in this 43-minute conference talk. Learn how GetYourGuide transformed their error-prone legacy system into a robust, schema-change-resilient pipeline capable of multiple daily data lake refreshes. Discover the architecture and implementation steps for building a Change Data Capture layer that streams database changes directly to Kafka. Gain insights into reducing operational time with Databricks and understand the benefits of fresh data for business users. The talk covers the extraction layer, schema service, data landscape, dependency management, transformation layer components, and the importance of testing. Explore special syntax elements, the small file problem, and data warehouse integration. Conclude with a Q&A session addressing how the new pipeline works and its read-write capabilities.

Modern ETL Pipelines with Change Data Capture - Building Resilient Data Streams

Databricks

Add to list

#Data Science #Data Engineering #Big Data #Apache Spark #Programming #Cloud Computing #Apache Airflow #Business #Business Intelligence #Data Lakes #Data Streaming #ETL Pipelines #Computer Science #Information Technology #Data Management #Data Integration #Debezium

0:00 / 0:00