Главная
Study mode:
on
1
Intro
2
Denis Diderot and the Diderot effect
3
The Diderot effect in data processing systems
4
The Diderot effect in Spark: Project Tungsten (2015)
5
The Diderot effect, revised for 2021
6
What's your oldest Spark application?
7
Abstractions can leak in performance tuning
8
Choosing the right partition size is difficult
9
Adaptive query execution: coalescing
10
Sidebar: some basics on joins
11
Adaptive query execution: partition pruning
12
Enabling adaptive query execution
13
Accelerating Spark with NVIDIA GPUs
14
Case study: predicting customer churn
15
What's next?
Description:
Explore strategies for modernizing Apache Spark applications to leverage the full potential of Spark 3.0 and beyond in this 25-minute talk by Databricks. Learn about common sources of technical debt in mature Spark applications and how to address them, discover when to replace manual configurations with Adaptive Query Execution, and understand how to optimize queries for columnar processing and GPU execution. Gain insights from concrete examples of customer churn modeling, recent experiences in modernizing Spark applications, and lessons learned from maintaining Spark extensions across multiple versions. Delve into topics such as the Diderot effect in data processing systems, Project Tungsten, adaptive query execution techniques, and accelerating Spark with NVIDIA GPUs. Acquire valuable knowledge to enhance your analytics workloads and incorporate accelerated ML training directly into your Spark applications.

Modernizing Apache Spark 3.0 Applications - Best Practices and Optimization Techniques

Databricks
Add to list
0:00 / 0:00