Главная
Study mode:
on
1
Intro
2
Supporting Programming Models for Multi-Petaflop and Exaflop Systems: Challenges
3
Designing (MPX) Programming Models at Exascale
4
Overview of the MVAPICH2 Project
5
MVAPICH2 Release Timeline and Downloads
6
Architecture of MVAPICH2 Software Family for HPC, DL/ML, and Data Science
7
Highlights of MVAPICH2 2.3.6-GA Release
8
Startup Performance on TACC Frontera
9
Performance of Collectives with SHARP on TACC Frontera
10
Performance Engineering Applications using MVAPICH2 and TAU
11
Overview of Some of the MVAPICH2-X Features
12
Impact of DC Transport Protocol on Neuron
13
Cooperative Rendezvous Protocols
14
Benefits of the New Asynchronous Progress Design: Broadwell + InfiniBand
15
Shared Address Space (XPMEM)-based Collectives Design
16
MVAPICH2-GDR 2.3.6
17
Highlights of some MVAPICH2-GDR Features for HPC, DL, ML and Data Science
18
MVAPICH2-GDR with CUDA-aware MPI Support
19
Performance with On-the-fly Compression Support in MVAPICH2-GDR
20
Collectives Performance on DGX2-A100 - Small Message
21
MVAPICH2 (MPI)-driven Infrastructure for ML/DL Training
22
Distributed TensorFlow on ORNL Summit 1,536 GPUS
23
Distributed TensorFlow on TACC Frontera (2048 CPU nodes)
24
PyTorch, Horovod and DeepSpeed at Scale: Training ResNet-50 on 256 V100 GPUs
25
Dask Architecture
26
Benchmark #1: Sum of cupy Array and its Transpose (12)
27
Benchmark #2: cuDF Merge (TACC Frontera GPU Subsystem)
28
MVAPICH2-GDR Upcoming Features for HPC and DL
29
Funding Acknowledgments
Description:
Explore the design of high-performance scalable middleware for HPC, AI, and Data Science in exascale systems and clouds in this comprehensive conference talk. Delve into the challenges of supporting programming models for multi-petaflop and exaflop systems, and learn about the MVAPICH2 project's architecture and features. Discover performance improvements in startup, collectives, and applications using MVAPICH2 and TAU. Examine the benefits of new protocols and designs, including DC transport, cooperative rendezvous, and shared address space collectives. Investigate MVAPICH2-GDR's capabilities for HPC, deep learning, and data science, with a focus on CUDA-aware MPI support and on-the-fly compression. Analyze performance benchmarks for distributed TensorFlow, PyTorch, Horovod, and DeepSpeed at scale, as well as Dask architecture and cuDF merge operations. Gain insights into upcoming features and funding acknowledgments for cutting-edge middleware development.

Designing High-Performance Scalable Middleware for HPC, AI, and Data Science in Exascale Systems and Clouds

Linux Foundation
Add to list