Dive deep into GPU support in Apache Spark 3.x with this comprehensive 47-minute video from Databricks. Explore accelerator-aware task scheduling, columnar data processing support, fractional scheduling, and stage-level resource scheduling and configuration. Learn about the Apache Spark 3.x RAPIDS plugin, enabling GPU acceleration without code changes. Understand how the Catalyst optimizer physical plan is modified for GPU-aware scheduling and how the plugin leverages RAPIDS libraries. Discover optimizations made to the shuffle plugin using UCX for GPU memory communication. Gain insights into future optimizations involving RDMA and GPU Direct Storage. Examine industry-standard benchmarks and real-world production dataset performance. Cover topics such as GPU scheduling, discovery scripts, assignments API, UI, stage-level scheduling, SOL columnar processing, Project Hydrogen, Deep Learning Recommendation Machines, and accelerated shuffle results.
Deep Dive into GPU Support in Apache Spark 3.x - Accelerator-Aware Scheduling and RAPIDS Plugin