Главная
Study mode:
on
1
Intro
2
What is performance tuning?
3
Why automate performance tuning?
4
Perf tuning is an iterative process
5
Common issues: lack of parallelism
6
Common issues: shuffle spill
7
Improvements based on node metrics
8
Cost-speed trade-off
9
Recap: manual perf tuning
10
Open source tuning tools
11
Motivations
12
Architecture (tech)
13
Architecture (algo)
14
Heuristics example
15
Evaluator
16
Experiment manager
17
Data Mechanics platform
18
Common issues: data skew
19
Impact of automated tuning
Description:
Discover how to streamline and automate performance tuning for Apache Spark in this 41-minute conference talk by Jean Yves Stephan from Data Mechanics. Learn about the challenges of maintaining efficient and stable data pipelines in production, including selecting appropriate infrastructure, configuring Spark correctly, and ensuring scalability as data volumes grow. Explore the key information and parameters for manual tuning, and delve into various automation options, from open-source tools to managed services. Gain insights into common issues like lack of parallelism, shuffle spill, and data skew, and understand how to leverage node metrics for improvements. The talk covers the iterative nature of performance tuning, cost-speed trade-offs, and the architecture and algorithms behind automated tuning tools. By the end, you'll have a comprehensive understanding of how to optimize your Spark applications and meet SLAs efficiently, even as you scale to hundreds or thousands of jobs.

How to Automate Performance Tuning for Apache Spark

Databricks
Add to list
0:00 / 0:00