Explore the design of high-performance scalable middleware for HPC, AI, and Data Science in exascale systems and clouds in this comprehensive conference talk. Delve into the challenges of supporting programming models for multi-petaflop and exaflop systems, and learn about the MVAPICH2 project's architecture and features. Discover performance improvements in startup, collectives, and applications using MVAPICH2 and TAU. Examine the benefits of new protocols and designs, including DC transport, cooperative rendezvous, and shared address space collectives. Investigate MVAPICH2-GDR's capabilities for HPC, deep learning, and data science, with a focus on CUDA-aware MPI support and on-the-fly compression. Analyze performance benchmarks for distributed TensorFlow, PyTorch, Horovod, and DeepSpeed at scale, as well as Dask architecture and cuDF merge operations. Gain insights into upcoming features and funding acknowledgments for cutting-edge middleware development.
Designing High-Performance Scalable Middleware for HPC, AI, and Data Science in Exascale Systems and Clouds