Главная
Study mode:
on
1
Intro
2
Challenges of Scaling Spark
3
Tackling Scaling Challenges
4
Typical Spark User Questions
5
Automatic Failure Root Cause Analysis
6
Platform Failure Reason Breakdown
7
Grid Bench - Performance Analysis
8
Tuning Heuistics & Recommendations
9
Scaling Spark History Server
10
A Low-Latency Solution
11
Issues with Spark Shuffle Service
12
Next-gen Spark shuffle service
13
Push-Merge Shuffle
14
Fetch Merged Shuffle Data
15
Magnet Shuffle Service Recap
16
Takeaways
Description:
Explore the scaling challenges and solutions for Apache Spark at LinkedIn in this 26-minute conference talk from Databricks. Dive into the company's journey of transitioning Spark from an experiment to the dominant production compute engine, handling a 3X growth in daily applications. Learn how LinkedIn tackled major infrastructure scaling bottlenecks, balanced limited compute resources with increasing demands, improved user development productivity, and boosted job efficiency. Discover optimizations made to core Spark components, improvements to cluster resource scheduling, and automation of job failure root cause analysis. Gain insights into innovative solutions like Grid Bench for performance analysis, tuning heuristics and recommendations, scaling Spark History Server, and the next-generation Spark shuffle service with Push-Merge Shuffle. Walk away with valuable takeaways on managing large-scale Spark deployments and empowering users in a rapidly growing environment.

Tackling Scaling Challenges of Apache Spark at LinkedIn - Infrastructure Optimization and User Productivity

Databricks
Add to list
0:00 / 0:00