Главная
Study mode:
on
1
Intro
2
History of Slack
3
Reliability Crisis
4
Incident Management Vision
5
Incident Management Plan
6
Incident Management Training
7
Severity Levels
8
Major IC
9
Major IC oncall
10
Major IC responsibility
11
Simultaneous incidents
12
Area Command
13
Long Duration Incidents
14
Pillar Incidents
15
Whats Next
16
Ongoing Challenges
17
Recruitment and Training
18
Challenge of Success
Description:
Explore the evolution of incident management at Slack in this 28-minute conference talk from SREcon21. Discover how the company handles dozens of incidents weekly while delivering over 150 million messages per minute at peak. Learn about Slack's journey to make incident management a core capability for their entire engineering team, including their history, reliability crisis, and vision for incident management. Gain insights into their incident management plan, training, severity levels, and the roles of Major Incident Commanders. Understand how Slack manages simultaneous incidents, implements Area Command, and handles long-duration and pillar incidents. Examine ongoing challenges, recruitment and training strategies, and the impact of success on incident management practices.

Evolution of Incident Management at Slack

USENIX
Add to list
0:00 / 0:00