Главная
Study mode:
on
1
Intro
2
Our Setup
3
Configuring Cluster Test change with
4
Cache/Persist
5
Join Optimization
6
Filter Trick
7
Salting - Reduce Skew
8
Things to remember
9
Fair Scheduling
10
Serialization
11
Enable GC Logging
12
ParallelGC (default)
13
Takeaways
Description:
Dive into best practices for fine-tuning and enhancing Apache Spark job performance in this 25-minute video from Databricks. Explore real-world problem-solving techniques and learn how to optimize resources by adjusting parameters such as garbage collector selection, serialization, worker/executor numbers, data partitioning, and Java heap settings. Analyze Spark UI execution DAGs to identify bottlenecks, optimize joins, and manage partition sizes. Discover strategies for handling data skew, utilizing scheduling pools, and implementing fair scheduler. Gain insights into Spark SQL rollup best practices and learn which approaches to avoid for improved performance.

Fine-Tuning and Enhancing Performance of Apache Spark Jobs

Databricks
Add to list
0:00 / 0:00