Explore the potential of distributed storage for AI/ML workloads on Kubernetes in this 34-minute conference talk by Diane Feddema and Kyle Bader from Red Hat. Discover the benefits and challenges of containerized machine learning workloads, including portability, declarative configuration, and reduced administrative toil. Learn about the performance trade-offs between local and open-source distributed storage solutions, and gain insights into running MLPerf training jobs in Kubernetes environments. Examine the utility of machine learning formats like RecordIO and TFRecord for performance optimization and model validation flexibility. Dive into topics such as the machine learning lifecycle, object detection segmentation, COCO datasets, GPU utilization, Python, PyTorch, and benchmark preparation techniques.
Is There a Place For Distributed Storage For AI - ML on Kubernetes