Play all

Intro

Supporting Programming Models for Multi-Petaflop and Exaflop Systems: Challenges

Designing (MPX) Programming Models at Exascale

Overview of the MVAPICH2 Project

MVAPICH2 Release Timeline and Downloads

Architecture of MVAPICH2 Software Family for HPC, DL/ML, and Data Science

Highlights of MVAPICH2 2.3.6-GA Release

Startup Performance on TACC Frontera

Performance of Collectives with SHARP on TACC Frontera

Performance Engineering Applications using MVAPICH2 and TAU

Overview of Some of the MVAPICH2-X Features

Impact of DC Transport Protocol on Neuron

Cooperative Rendezvous Protocols

Benefits of the New Asynchronous Progress Design: Broadwell + InfiniBand

Shared Address Space (XPMEM)-based Collectives Design

MVAPICH2-GDR 2.3.6

Highlights of some MVAPICH2-GDR Features for HPC, DL, ML and Data Science

MVAPICH2-GDR with CUDA-aware MPI Support

Performance with On-the-fly Compression Support in MVAPICH2-GDR

Collectives Performance on DGX2-A100 - Small Message

MVAPICH2 (MPI)-driven Infrastructure for ML/DL Training

Distributed TensorFlow on ORNL Summit 1,536 GPUS

Distributed TensorFlow on TACC Frontera (2048 CPU nodes)

PyTorch, Horovod and DeepSpeed at Scale: Training ResNet-50 on 256 V100 GPUs

Dask Architecture

Benchmark #1: Sum of cupy Array and its Transpose (12)

Benchmark #2: cuDF Merge (TACC Frontera GPU Subsystem)

MVAPICH2-GDR Upcoming Features for HPC and DL

Funding Acknowledgments

Description:

Explore the design of high-performance scalable middleware for HPC, AI, and Data Science in exascale systems and clouds in this comprehensive conference talk. Delve into the challenges of supporting programming models for multi-petaflop and exaflop systems, and learn about the MVAPICH2 project's architecture and features. Discover performance improvements in startup, collectives, and applications using MVAPICH2 and TAU. Examine the benefits of new protocols and designs, including DC transport, cooperative rendezvous, and shared address space collectives. Investigate MVAPICH2-GDR's capabilities for HPC, deep learning, and data science, with a focus on CUDA-aware MPI support and on-the-fly compression. Analyze performance benchmarks for distributed TensorFlow, PyTorch, Horovod, and DeepSpeed at scale, as well as Dask architecture and cuDF merge operations. Gain insights into upcoming features and funding acknowledgments for cutting-edge middleware development.

Designing High-Performance Scalable Middleware for HPC, AI, and Data Science in Exascale Systems and Clouds

Linux Foundation

Add to list

#Computer Science #High Performance Computing #Data Science #Machine Learning #Programming #Software Development #CUDA #Parallel Computing #MPI #Web Development #Middleware #Distributed Computing