Play all

Introduction

Agenda

Motivation

Recap

Original Example

Results

New Challenges

Rtmp Architecture

Optimization Features

Workflow

Summary

Performance Evaluation

Examples

Call to Action

Optima Natives

Description:

Explore a 33-minute conference talk on accelerating Apache Spark shuffle operations for cloud-based data analytics using remote persistent memory pools. Dive into the challenges of serving growing data-driven AI and analytics workloads in disaggregated storage and compute environments. Learn about a proposed fully disaggregated shuffle solution leveraging persistent memory and RDMA technologies, including a new pluggable shuffle manager and distributed storage system. Discover how this innovative approach improves Spark's scalability, performance, and reliability, with experimental results showing up to 10x performance speedup over traditional shuffle solutions. Gain insights into the architecture, optimization features, and workflow of this cutting-edge solution presented by Databricks.

Accelerating Apache Spark Shuffle for Data Analytics on Cloud with Remote Persistent Memory Pools

Databricks

Add to list

#Data Science #Big Data #Apache Spark #Programming #Cloud Computing #Computer Science #Distributed Systems #Data Analytics #Software Engineering #Scalability #Computer Architecture #Persistent Memory