Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Learn how to optimize GPU training performance through distributed caching in this technical conference talk. Discover strategies for addressing GPU underutilization caused by compute-storage separation while leveraging modern hardware components like NVMe storage and RDMA networks. Explore experimental results from implementing a Kubernetes-native distributed caching layer that enhances PyTorch training efficiency using NVMe storage and high-speed RDMA networks (InfiniBand or specialized NICs). Gain insights into building balanced architectures that maximize GPU utilization and accelerate deep learning workflows through effective data access optimization techniques.

Exploring Distributed Caching for Faster GPU Training with NVMe, GDS, and RDMA

Linux Foundation

Add to list

#Computer Science #Distributed Systems #Distributed Caching #DevOps #Kubernetes #Deep Learning #PyTorch #Database Management #Data Storage #High Performance Computing #Parallel Computing #GPU Computing #Computer Hardware #NVMe #Computer Networking #RDMA #Infiniband

0:00 / 0:00

Exploring Distributed Caching for Faster GPU Training with NVMe, GDS, and RDMA

Exploring Distributed Caching for Faster GPU Training with NVMe, GDS, and RDMA - Hope Wang & Bin Fan