Sokovan Container Orchestrator for Accelerated AI:ML Workloads and Massive scale GPU Computing
Description:
Learn about a powerful Python-based container orchestrator in this 28-minute conference talk presented by Jeongkyu Shin and Joongi Kim. Discover how to efficiently manage resource-intensive batch workloads in containerized environments through acceleration-aware, multi-tenant scheduling capabilities. Explore the dual-layer scheduling system, featuring a cluster-level scheduler for customizable job placement strategies and workload control, alongside a node-level scheduler that optimizes container performance through automatic hardware accelerator mapping. Gain insights into how this solution outperforms traditional tools like Slurm for AI workloads, and understand its successful implementation across various industries for GPU-intensive tasks including AI training and services. Master the integration of multiple hardware acceleration technologies that help container-based MLOps platforms maximize the potential of cutting-edge hardware.
Sokovan - Container Orchestrator for Accelerated AI/ML Workloads and Massive Scale GPU Computing