Building the core fabric of accelerated hybrid AI clusters using Cilium
Description:
Learn how to implement high-performance networking for AI workloads in a 26-minute conference talk that explores the transition from Docker's overlay network driver to Cilium in a custom container orchestrator called Sokovan. Discover the challenges of using Kubernetes for HPC and AI workloads, and how Sokovan addresses these limitations through improved multi-tenancy and multi-node computing clusters. Explore the technical details of integrating Cilium as an inter-container networking driver, examining both performance benefits and programmability advantages. Gain insights into achieving significant throughput and latency improvements for specific inter-container networking scenarios, with a focus on scaling and load balancing ML inference traffic at scale through application proxy implementation.
Building the Core Fabric of Accelerated Hybrid AI Clusters Using Cilium