Play all

Intro

Kubernetes is a new cluster manager for Spark

The Spark on Kubernetes Journey

Spark on YARN: architecture & pain points

Spark on Kubernetes: architecture & benefits

Our background - Ocean for Apache Spark

Spot instances

How does Spark cope with spot interruptions?

Best practice: run driver OD, execs on Spot

This is how your cluster may look like

Limitation: Avoid cross-Az data transfer

We ran an experiment to measure the impact

Experiment results

Since Spark 3.1: Graceful Exec Decommissioning

Spark 3.1 - Graceful Exec Decommissioning

Graceful Exec Decommissioning - Experiment

Since Spark 3.2: Executor PVC Reuse

What's new in Spark 3.3 for Spark-on-kes

DATA+AI SUMMIT 2022

Description:

Discover how to optimize Apache Spark on Kubernetes using spot instances in this 33-minute Databricks conference talk. Learn concrete guidelines and code examples for running Spark reliably on spot VMs, which can provide up to 90% cost savings. Explore key topics such as using spot nodes for Spark executors, mixing instance types and sizes to reduce interruption risks, and leveraging cluster autoscaling. Gain insights into Spark 3.0's graceful decommissioning feature for preserving shuffle files on executor shutdown, and Spark 3.1's PVC reuse on executor restart for disaggregating compute and shuffle storage. Understand the evolution of Spark on Kubernetes, including its architecture, benefits, and comparison to Spark on YARN. Examine real-world experiments demonstrating the impact of spot instances and graceful executor decommissioning. Stay informed about upcoming features in future Spark releases to enhance your data processing capabilities on Kubernetes.

How to Make Apache Spark on Kubernetes Run Reliably on Spot Instances

Databricks

Add to list

#Data Science #Big Data #Apache Spark #Programming #Cloud Computing #Computer Science #DevOps #Kubernetes #Data Processing #Distributed Computing #Cluster Management #Business #Business Strategy #Cost Optimization

0:00 / 0:00