Главная
Study mode:
on
1
Introduction
2
About Daniel
3
Agenda
4
Software Hierarchy
5
Demo
6
Hardware
7
Baseline
8
CP Utilization
9
ganglia reports
10
lazy loading
11
code
12
data skipping
13
optimizations
14
output
15
shuffle partitions
16
workload
17
shuffle partition example
18
shuffle partition summary
19
input partition summary
20
what does this do
21
output partitions
22
workload example
23
Partitions
24
Balance
25
Persistence
26
DBIO Cache
27
Joint Optimization
28
Broadcast Join
29
Skew Joins
30
Group Buys
31
The Beast
Description:
Dive into a comprehensive conference talk on Apache Spark Core optimization techniques. Learn how to properly shape partitions and jobs to enable powerful optimizations, eliminate skew, and maximize cluster utilization. Explore various Spark Partition shaping methods along with several optimization strategies, including join optimizations, aggregate optimizations, salting, and multi-dimensional parallelism. Gain insights into software hierarchy, hardware considerations, and practical demonstrations. Discover techniques such as lazy loading, data skipping, and shuffle partition management. Understand the importance of input and output partitions, workload balancing, and persistence strategies. Delve into advanced topics like DBIO Cache, Joint Optimization, Broadcast Join, and Skew Joins. By the end of this 1 hour and 32 minutes talk, master the skills needed to optimize Apache Spark Core for improved performance and efficiency in data analytics tasks.

Apache Spark Core - Practical Optimization Techniques - Partition Shaping and Job Optimization

Databricks
Add to list
0:00 / 0:00