Play all

Introduction

Who is Danny

Free Download

Databricks

Download the book

Adaptive Query Execution

Apache Spark 30

Performance

Spark Catalyst Optimizer

Logical Physical Planning

Aqe Fundamentals

Broadcast Hash Joins

Why not always broadcast join

Dynamically switch join strategies

Flipping the switch

Off script partitioning

Coalescence

Table Size

Coalescing

Traditional Data Warehousing Problem

Split Partitioning

QA Questions

Dynamic Partition Pruning

Dynamic Partition Pruning Before Optimization

Filter Scan

Results

Pseudo Rush

Building Ecosystem

Data Lake Reliability

Catalog API

SQL Statement Support

Partial Rights

Delete

Delete from Events

History Retention

Data Source v2 Catalog API

Data Quality Framework

Improved Performance

More About Delta

Description:

Discover the latest advancements in big data processing during this Seattle Spark + AI Meetup video. Learn about performance improvements in Apache Spark 3.0, including Adaptive Query Execution (AQE), Dynamic Partition Pruning (DPP), and handling skewed queries. Explore how Delta Lake enhances data lake reliability with ACID transactions, Schema Enforcement, and Time Travel. Gain insights into the new AQE framework's query performance gains, with examples from a 3TB TPC-DS benchmark. Understand how DPP speeds up performance by pruning partitions in star schema designs. Delve into topics such as Spark Catalyst Optimizer, logical and physical planning, broadcast hash joins, and coalescing. Examine the traditional data warehousing problem and learn about split partitioning. Discover the Data Lake Reliability features, including Catalog APIs, SQL statement support, and partial rights. Explore the Data Quality Framework and improved performance in Delta Lake. This comprehensive presentation covers essential aspects of Apache Spark 3.0 and Delta Lake, providing valuable knowledge for big data professionals and enthusiasts. Read more

How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

Databricks

Add to list

#Data Science #Big Data #Apache Spark #Humanities #Literature #Science Fiction #Time Travel #Delta Lake