Главная
Study mode:
on
1
Introduction
2
Who is Danny
3
Free Download
4
Databricks
5
Download the book
6
Adaptive Query Execution
7
Apache Spark 30
8
Performance
9
Spark Catalyst Optimizer
10
Logical Physical Planning
11
Aqe Fundamentals
12
Broadcast Hash Joins
13
Why not always broadcast join
14
Dynamically switch join strategies
15
Flipping the switch
16
Off script partitioning
17
Coalescence
18
Table Size
19
Coalescing
20
Traditional Data Warehousing Problem
21
Split Partitioning
22
QA Questions
23
Dynamic Partition Pruning
24
Dynamic Partition Pruning Before Optimization
25
Filter Scan
26
Results
27
Pseudo Rush
28
Building Ecosystem
29
Data Lake Reliability
30
Catalog API
31
SQL Statement Support
32
Partial Rights
33
Delete
34
Delete from Events
35
History Retention
36
Data Source v2 Catalog API
37
Data Quality Framework
38
Improved Performance
39
More About Delta
Description:
Discover the latest advancements in big data processing during this Seattle Spark + AI Meetup video. Learn about performance improvements in Apache Spark 3.0, including Adaptive Query Execution (AQE), Dynamic Partition Pruning (DPP), and handling skewed queries. Explore how Delta Lake enhances data lake reliability with ACID transactions, Schema Enforcement, and Time Travel. Gain insights into the new AQE framework's query performance gains, with examples from a 3TB TPC-DS benchmark. Understand how DPP speeds up performance by pruning partitions in star schema designs. Delve into topics such as Spark Catalyst Optimizer, logical and physical planning, broadcast hash joins, and coalescing. Examine the traditional data warehousing problem and learn about split partitioning. Discover the Data Lake Reliability features, including Catalog APIs, SQL statement support, and partial rights. Explore the Data Quality Framework and improved performance in Delta Lake. This comprehensive presentation covers essential aspects of Apache Spark 3.0 and Delta Lake, providing valuable knowledge for big data professionals and enthusiasts. Read more

How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

Databricks
Add to list