Главная
Study mode:
on
1
Introduction: What Can Possibly Go Wrong?
2
Real-Life Scenario: Flood of Traffic
3
Real-Life Scenario: Retry Storm
4
Real-Life Scenario: Plan B Went Poorly
5
Real-Life Scenario: Bad Commit
6
Real-Life Scenario: Lack of Sufficient Ownership
7
Real-Life Scenario: Script Errors
8
Prevention Strategies: Defensive Coding Practices
9
Logging and Error Handling Best Practices
10
Setting Effective Alerts
11
Mitigation Strategies for Alerts
12
Preparing for High Velocity Events
13
Conducting a Self Review
14
Conclusion and Takeaways
Description:
Explore valuable insights on building resilient systems through real-life scenarios and practical strategies in this 20-minute conference talk from Conf42 IM 2024. Dive into various challenges faced by large-scale systems, including traffic floods, retry storms, and problematic commits. Learn essential prevention techniques such as defensive coding practices, effective logging, and error handling. Discover how to set up and mitigate alerts, prepare for high-velocity events, and conduct thorough self-reviews. Gain actionable takeaways to enhance system resilience based on experiences from industry giants Amazon and Meta.

Lessons in Building Resilient Systems - Insights from Amazon and Meta

Conf42
Add to list
0:00 / 0:00