Play all

Intro

Challenges of Scaling Spark

Tackling Scaling Challenges

Typical Spark User Questions

Automatic Failure Root Cause Analysis

Platform Failure Reason Breakdown

Grid Bench - Performance Analysis

Tuning Heuistics & Recommendations

Scaling Spark History Server

A Low-Latency Solution

Issues with Spark Shuffle Service

Next-gen Spark shuffle service

Push-Merge Shuffle

Fetch Merged Shuffle Data

Magnet Shuffle Service Recap

Takeaways

Description:

Explore the scaling challenges and solutions for Apache Spark at LinkedIn in this 26-minute conference talk from Databricks. Dive into the company's journey of transitioning Spark from an experiment to the dominant production compute engine, handling a 3X growth in daily applications. Learn how LinkedIn tackled major infrastructure scaling bottlenecks, balanced limited compute resources with increasing demands, improved user development productivity, and boosted job efficiency. Discover optimizations made to core Spark components, improvements to cluster resource scheduling, and automation of job failure root cause analysis. Gain insights into innovative solutions like Grid Bench for performance analysis, tuning heuristics and recommendations, scaling Spark History Server, and the next-generation Spark shuffle service with Push-Merge Shuffle. Walk away with valuable takeaways on managing large-scale Spark deployments and empowering users in a rapidly growing environment.

Tackling Scaling Challenges of Apache Spark at LinkedIn - Infrastructure Optimization and User Productivity

Databricks

Add to list

#Data Science #Big Data #Apache Spark #Computer Science #Software Engineering #Scaling #Engineering #Failure Analysis #Information Technology #Infrastructure Management

0:00 / 0:00