Главная
Study mode:
on
1
Intro
2
databricks
3
Deep Dive into the New Features of Apache Spark 3.0
4
A Delta Lake 0.7.0 + Spark 3.0 AMA
5
Spark Catalyst Optimizer
6
Adaptive Query Execution AQE
7
Apache SparkTM 3.0 AQE Fundamentals
8
Starting with Broadcast Hash Joins
9
Dynamically Switching Join Strategies Apache Spark 3.0 NE Fundamentals
10
Dynamically Coalescing Shuffle Partitions Apache Spark 3.0 ADÉ Fundamentals
11
Dynamically Optimize Skew Joins
12
TPC-DS performance gains from AQE
13
Dynamic Partition Pruning: Before Optimiza
14
How to Use Join Hints? Broadcast Hash Join
15
Extensibility and Ecosystem
16
Data Source V2
17
But what happens with DML under the cover What really happens to the file system when you run delete update and merge?
18
Time Travel The transaction log and additive files - data versioning
19
Control Table History Retention
20
Enable DataSourceV2 and Catalog API Integration
21
Data Quality Framework Improved SOL DOL and DMLS and ACID Transactions are just the start
22
Lakehouse Paradigm Improved Performance. DW-like capabilities, on low cost cloud object stores
23
Try out Spark 3.0 + Delta Lake now!
Description:
Explore how Apache Spark 3.0 and Delta Lake enhance data lake reliability in this 58-minute webinar from Databricks. Learn about Apache Spark's role in big data processing, the evolution of data lake architectures, and Delta Lake's capabilities for ensuring reliable data. Discover how unified batch and streaming simplifies architectures. Dive into Spark 3.0's new features, including the Adaptive Query Execution framework for improved query performance, Dynamic Partition Pruning for faster processing in star schema designs, and accelerator-aware scheduling for GPU optimization. Examine new Pandas UDF types and function APIs, as well as enhanced monitoring capabilities. Gain insights into Delta Lake 0.7.0, the Spark Catalyst Optimizer, and the Lakehouse paradigm. Understand how ACID transactions, schema enforcement, and time travel contribute to data reliability. Learn about data quality frameworks, improved SQL capabilities, and the integration of DataSourceV2 and Catalog API.

How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

Databricks
Add to list
0:00 / 0:00