Explore strategies for managing the operational complexity of Kubeflow in this 25-minute conference talk from KubeCon + CloudNativeCon North America 2021. Gain insights into deploying, configuring, and maintaining Kubeflow, a machine learning toolkit for Kubernetes. Learn tips for navigating the platform's many components, including notebooks, service meshes, and pipelines. Discover lessons from experienced practitioners on managing Kubeflow deployments and contributing to upstream development. Delve into topics such as deployment options, operators, day 2 operations, component updates, security measures, integration with Istio, upgrades, external databases, and troubleshooting techniques. Equip yourself with practical knowledge to effectively tame the operational challenges of Kubeflow and optimize your machine learning workflows on Kubernetes.
Taming the Beast - Managing the Day 2 Operational Complexity of Kubeflow