Explore the challenges and solutions for running stateful services on DC/OS in this 27-minute conference talk by Nathan Shimek from New Context. Learn about overcoming obstacles such as volume pinning, dynamic provisioning limitations, and fixed resource requirements when deploying mission-critical applications on DC/OS. Gain insights into real-world projects conducted with major container users, addressing key issues like failure domains, production-quality design, and operator safety. Discover strategies for field failure testing, preventing cascading failures, and managing multidisciplinary infrastructure. Delve into crucial aspects of cluster management, platform security, maintenance, and externalizing services. Understand the importance of organizational guardrails and training in successful DC/OS deployments.
Building Multiple Scalable DC/OS Deployments - Lessons for Running Stateful Services