Главная
Study mode:
on
1
Kubernetes Deep Dive: Elevating ML Workload Monitoring to Art - Ziwen Ning & Geeta Gharpure
Description:
Dive deep into the art of monitoring ML workloads on Kubernetes in this comprehensive conference talk. Explore strategies for optimizing AI/ML workloads, combining node health assurance with advanced monitoring techniques. Learn about AWS Neuron's integration for problem detection and the deployment of Neuron Monitor for enhanced observability. Discover how to diagnose and resolve real-world issues in AI/ML clusters using robust detection and recovery mechanisms. Gain insights on leveraging tools such as Kubernetes node problem detector, Prometheus, Grafana, and AWS CloudWatch for in-depth performance analytics. Empower yourself with the knowledge to ensure resilient and transparent Kubernetes environments for AI/ML applications.

Kubernetes Deep Dive - Elevating ML Workload Monitoring to Art

CNCF [Cloud Native Computing Foundation]
Add to list
0:00 / 0:00