Explore chaos engineering principles and practices in this comprehensive conference talk. Learn how to enhance system resilience by intentionally introducing failures in distributed architectures. Discover tools and methods for injecting failures into AWS cloud environments, revealing potential issues before they become outages. Gain insights into microservices, game day exercises, and Netflix's pioneering approach to chaos engineering. Delve into prerequisites, application resilience, steady state behavior, and experiment design. Examine various chaos engineering tools, from simple code-based approaches to advanced platforms like Chaos Toolkit and Gremlin. Master techniques for health checks, rate limiting, latency injection, and stress testing. Equip yourself with the knowledge to implement chaos engineering practices and improve the reliability of complex distributed systems.
Practical Chaos Engineering - Breaking Things on Purpose to Make Them More Resilient Against Failure