Главная
Study mode:
on
1
Intro
2
Kubernetes is a new cluster manager for Spark
3
The Spark on Kubernetes Journey
4
Spark on YARN: architecture & pain points
5
Spark on Kubernetes: architecture & benefits
6
Our background - Ocean for Apache Spark
7
Spot instances
8
How does Spark cope with spot interruptions?
9
Best practice: run driver OD, execs on Spot
10
This is how your cluster may look like
11
Limitation: Avoid cross-Az data transfer
12
We ran an experiment to measure the impact
13
Experiment results
14
Since Spark 3.1: Graceful Exec Decommissioning
15
Spark 3.1 - Graceful Exec Decommissioning
16
Graceful Exec Decommissioning - Experiment
17
Since Spark 3.2: Executor PVC Reuse
18
What's new in Spark 3.3 for Spark-on-kes
19
DATA+AI SUMMIT 2022
Description:
Discover how to optimize Apache Spark on Kubernetes using spot instances in this 33-minute Databricks conference talk. Learn concrete guidelines and code examples for running Spark reliably on spot VMs, which can provide up to 90% cost savings. Explore key topics such as using spot nodes for Spark executors, mixing instance types and sizes to reduce interruption risks, and leveraging cluster autoscaling. Gain insights into Spark 3.0's graceful decommissioning feature for preserving shuffle files on executor shutdown, and Spark 3.1's PVC reuse on executor restart for disaggregating compute and shuffle storage. Understand the evolution of Spark on Kubernetes, including its architecture, benefits, and comparison to Spark on YARN. Examine real-world experiments demonstrating the impact of spot instances and graceful executor decommissioning. Stay informed about upcoming features in future Spark releases to enhance your data processing capabilities on Kubernetes.

How to Make Apache Spark on Kubernetes Run Reliably on Spot Instances

Databricks
Add to list
0:00 / 0:00