Главная
Study mode:
on
1
Introduction
2
The Secret Lives of SREs
3
Coordinate Multiple Diverse Perspectives
4
Backup Issues
5
Hidden Complexity
6
Outlier Event
7
Sarah
8
Sarahs Knowledge
9
Incident Response
10
Incident Command
11
Speed Bumps
12
Distributed Computing
13
Conclusion
Description:
Explore the intricacies of incident response and coordination in remote SRE teams through this 48-minute conference talk from SREcon20 Americas. Delve into Dr. Laura Maguire's three-year research on engineering teams handling service outages, examining 62 cases across four organizations. Discover surprising findings that challenge existing domain models, including how incident management differs from GoogleSRE suggestions and how incident command can hinder fast resolution. Learn about the subtle choreography of cognitive work in fault management, the potential drawbacks of coordination tools, and strategies for adaptive choreography. Gain insights into how tooling and intra-organizational dependencies affect coordination costs across time and organizational boundaries, increasing complexity for SREs. Understand the challenges of coordinating multiple perspectives, dealing with backup issues, and managing hidden complexities in distributed computing environments.

The Secret Lives of SREs - Controlling the Costs of Coordination across Remote Teams

USENIX
Add to list
0:00 / 0:00