Главная
Study mode:
on
1
Intro
2
Welcome
3
Introductions
4
Agenda
5
Data Quality Cone of Anxiety
6
How do we address bad data
7
What is data observability
8
Freshness
9
Distribution
10
Volume
11
Schema
12
Data Lineage
13
Data Reliability Lifecycle
14
Lake vs Warehouse
15
Metadata
16
Storage
17
Query logs
18
Query engine
19
Questions
20
Describe Detail
21
Architecture for observability
22
Measuring update times
23
Loading data in CSV or JSON
24
Update cadence
25
Feature engineering
26
Lambda function
27
Delay between updates
28
Model Parameters
29
Training Labels
30
Questions and Answers
31
Summary
32
Upcoming events
33
Data Quality Fundamentals
34
Monte Carlo
Description:
Explore architecting for data quality in the lakehouse with Delta Lake and PySpark in this comprehensive tech talk. Learn how to combat data downtime by adopting DevOps and software engineering best practices. Discover techniques for identifying, resolving, and preventing data issues across the data lakehouse. Gain insights into optimizing data reliability across metadata, storage, and query engine tiers. Build your own data observability monitors using PySpark and understand the role of tools like Delta Lake in scaling this design. Dive into topics such as the Data Quality Cone of Anxiety, data observability principles, and the Data Reliability Lifecycle. Examine the differences between data lakes and warehouses, and explore practical examples of measuring update times, loading data, and feature engineering. Access accompanying exercises and Jupyter notebooks to apply your newfound knowledge in real-world scenarios.

Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark

Databricks
Add to list