Explore a senior principal engineer's insights on Kubernetes failure stories in this 33-minute conference talk from GOTO Berlin 2019. Dive into real-world experiences of operating over 100 clusters, uncovering valuable lessons from incidents, failures, and user reports. Learn why Kubernetes remains a sensible choice despite its perceived complexity, and gain practical knowledge on common pitfalls, best practices, and improvements in areas such as ingress errors, CoreDNS OOMKills, and API server issues. Discover the importance of proper resource management, monitoring, and automated testing in maintaining robust Kubernetes environments. Understand the benefits of sharing failure stories for continuous improvement and fostering collaboration across organizations in the Kubernetes ecosystem.
Why I Love Kubernetes Failure Stories and You Should Too