Play all

Intro

Agenda

Adaptive Query Execution

Optimizations Overview

Partition Coalescing

Dynamic Join Strategy Selection

Importing EQE

Sales Table

Dynamically collapsing shuffle partitions

Demo of collapsing shuffle partitions

Demo of dynamically optimizing the query

Performance result

Dynamicly collapsing shuffle partitions

Dynamically switching joint strategies

Description:

Explore the Adaptive Query Execution framework introduced in Spark 3.0 through this 46-minute Databricks conference talk. Dive into how this new feature tackles performance challenges by re-optimizing and adjusting query plans based on runtime statistics. Learn about statistics-guided optimizations like partition coalescing and dynamic join strategy selection, and see their impact through practical query examples. Understand how these improvements address issues with outdated data statistics and inaccurate cardinality estimates in Spark SQL. Witness the significant performance gains achieved on the TPC-DS benchmark using Adaptive Query Execution, and gain insights into how this framework can speed up Spark SQL queries at runtime.

Adaptive Query Execution: Speeding Up Spark SQL at Runtime

Databricks

Add to list

#Data Science #Big Data #Apache Spark #Data Analytics #Computer Science #Distributed Computing

0:00 / 0:00