Главная
Study mode:
on
1
Introduction
2
Agenda
3
About GetYourGuide
4
Introduction to GetYourGuide
5
Legacy Pipelines
6
Introducing Riverless
7
Extraction Layer
8
DB Zoom
9
Schema Service
10
Converter
11
Abject
12
Data Landscape
13
Dependency Management
14
Transformation Layer Components
15
Special Syntax Elements
16
Importance of Testing
17
Dependencies
18
Benefits
19
Next Steps
20
Questions
21
How it works
22
ReadWrite
23
Small File Problem
24
Data Warehouse
25
Question
Description:
Explore the development of a modern ETL pipeline using Debezium, Kafka, Spark, and Airflow in this 43-minute conference talk. Learn how GetYourGuide transformed their error-prone legacy system into a robust, schema-change-resilient pipeline capable of multiple daily data lake refreshes. Discover the architecture and implementation steps for building a Change Data Capture layer that streams database changes directly to Kafka. Gain insights into reducing operational time with Databricks and understand the benefits of fresh data for business users. The talk covers the extraction layer, schema service, data landscape, dependency management, transformation layer components, and the importance of testing. Explore special syntax elements, the small file problem, and data warehouse integration. Conclude with a Q&A session addressing how the new pipeline works and its read-write capabilities.

Modern ETL Pipelines with Change Data Capture - Building Resilient Data Streams

Databricks
Add to list
0:00 / 0:00