Play all

Intro

Welcome

Why Delta Lake

Making Streaming First Class

Why isnt Delta built into Spark

Delta roadmap

Why Delta

Challenges

Automatic Schema Migration

Why is Delta important

Multiversion concurrency control

Listing files

Troubleshooting slow queries

Auto optimize

Vacuum vs Optimize

GDPRCOPPA Compliance

Z Ordering vs Partitioning

Delta Lake Roadmap

Description:

Explore the intricacies of Delta Lake in this informative interview with Michael Armbrust, the original creator of Spark SQL and a key figure in Apache Spark development. Dive into the world of reliable data lakes as Armbrust explains how Delta Lake brings ACID transactions, scalable metadata handling, and unified streaming and batch data processing to existing data lakes. Learn about the compatibility with Apache Spark APIs and discover the reasons behind Delta Lake's creation. Gain insights into making streaming first-class, automatic schema migration, multi-version concurrency control, and troubleshooting slow queries. Understand the differences between vacuum and optimize operations, explore GDPR and COPPA compliance, and compare Z-Ordering with partitioning. Get a glimpse of the Delta Lake roadmap and its future developments in this 26-minute episode of Data Brew, a series that offers straight-talking discussions on Data + AI evolution.

Demystifying Delta Lake - Data Reliability for Data Lakes

Databricks

Add to list

#Data Science #Big Data #Apache Spark #Delta Lake #Business #Business Intelligence #Data Lakes #Data Processing #Batch Processing #Batch Data Processing #Data Streaming #Streaming Data Processing

0:00 / 0:00