Explore a real-life disaster recovery scenario in this conference talk that details how a production environment was rebuilt from scratch in under an hour using KOps, ArgoCD, and Velero. Learn about the operational incident caused by misconfiguration, the challenges faced when standard backup and recovery methods failed, and the crucial role of GitOps and infrastructure as code in the recovery process. Discover the unexpected issues encountered during the 51-minute cluster recreation, including tool malfunctions and outdated disaster recovery guides. Gain insights into the workarounds employed, post-incident improvements, and valuable lessons learned. Understand the importance of disaster recovery planning, the benefits of migrating to GitOps and ArgoCD, and how to streamline deployment processes for faster recovery times.
Disaster Recovery: Rebuilding Production in Under 1 Hour Using KOps, ArgoCD and Velero