Главная
Study mode:
on
1
Intro
2
Welcome
3
Why Delta Lake
4
Making Streaming First Class
5
Why isnt Delta built into Spark
6
Delta roadmap
7
Why Delta
8
Challenges
9
Automatic Schema Migration
10
Why is Delta important
11
Multiversion concurrency control
12
Listing files
13
Troubleshooting slow queries
14
Auto optimize
15
Vacuum vs Optimize
16
GDPRCOPPA Compliance
17
Z Ordering vs Partitioning
18
Delta Lake Roadmap
Description:
Explore the intricacies of Delta Lake in this informative interview with Michael Armbrust, the original creator of Spark SQL and a key figure in Apache Spark development. Dive into the world of reliable data lakes as Armbrust explains how Delta Lake brings ACID transactions, scalable metadata handling, and unified streaming and batch data processing to existing data lakes. Learn about the compatibility with Apache Spark APIs and discover the reasons behind Delta Lake's creation. Gain insights into making streaming first-class, automatic schema migration, multi-version concurrency control, and troubleshooting slow queries. Understand the differences between vacuum and optimize operations, explore GDPR and COPPA compliance, and compare Z-Ordering with partitioning. Get a glimpse of the Delta Lake roadmap and its future developments in this 26-minute episode of Data Brew, a series that offers straight-talking discussions on Data + AI evolution.

Demystifying Delta Lake - Data Reliability for Data Lakes

Databricks
Add to list
0:00 / 0:00