Главная
Study mode:
on
1
Intro
2
Autoscaling on cloud
3
Upscale easy, downscale difficult
4
How are nodes used?
5
Factors affecting node downscaling
6
Terminology Any cluster generally comprises of following entities: • Resource Manager
7
Current resource allocation strategy
8
Example revisited with new allocation strategy
9
Downscale issues with Min Executors
10
Min executors distribution without packing
11
Min executors distribution with packing
12
How Shuffle data is produced / consumed?
13
External Shuffle Service
14
ESS at Qubole
15
Recap
16
Shuffle Cleanup • Shuffle data is deleted at the end of application by ESS
17
Issues with long running applications
18
Shuffle reuse in Spark
19
Downscaling a Node
20
Spark - Disaggregation of Compute and Storage • Mount some NFS endpoint on all the nodes of cluster • Change shuffle manager in Spark to something which can read/write shuffle from NFS mount point
21
Summary and Future Work
Description:
Explore the challenges and solutions for downscaling Apache Spark clusters in this 36-minute conference talk by Prakhar Jain from Databricks. Dive into the complexities of removing nodes from running Spark-on-Yarn clusters when workload decreases, addressing issues like container fragmentation and shuffle data retention. Learn about innovative approaches to improve downscaling, including changes in YARN's container allocation strategy and Spark's task scheduler for better container packing. Discover enhancements to Spark driver and External Shuffle Service (ESS) that enable proactive deletion of consumed shuffle data, facilitating faster node reclamation. Gain insights into terminology, resource allocation strategies, and the impact of minimum executors on downscaling. Examine the production and consumption of shuffle data, the role of ESS, and potential solutions for long-running applications. Conclude with an overview of Spark's compute and storage disaggregation and future directions for cluster downscaling optimization. Read more

Downscaling Apache Spark Clusters - Challenges and Solutions

Databricks
Add to list
0:00 / 0:00