Play all

Intro

What is performance tuning?

Why automate performance tuning?

Perf tuning is an iterative process

Common issues: lack of parallelism

Common issues: shuffle spill

Improvements based on node metrics

Cost-speed trade-off

Recap: manual perf tuning

Open source tuning tools

Motivations

Architecture (tech)

Architecture (algo)

Heuristics example

Evaluator

Experiment manager

Data Mechanics platform

Common issues: data skew

Impact of automated tuning

Description:

Discover how to streamline and automate performance tuning for Apache Spark in this 41-minute conference talk by Jean Yves Stephan from Data Mechanics. Learn about the challenges of maintaining efficient and stable data pipelines in production, including selecting appropriate infrastructure, configuring Spark correctly, and ensuring scalability as data volumes grow. Explore the key information and parameters for manual tuning, and delve into various automation options, from open-source tools to managed services. Gain insights into common issues like lack of parallelism, shuffle spill, and data skew, and understand how to leverage node metrics for improvements. The talk covers the iterative nature of performance tuning, cost-speed trade-offs, and the architecture and algorithms behind automated tuning tools. By the end, you'll have a comprehensive understanding of how to optimize your Spark applications and meet SLAs efficiently, even as you scale to hundreds or thousands of jobs.

How to Automate Performance Tuning for Apache Spark

Databricks

Add to list

#Data Science #Big Data #Apache Spark #Computer Science #Software Engineering #Performance Tuning #Scalability #Data Engineering #Data Pipelines

0:00 / 0:00