Главная
Study mode:
on
1
Introduction
2
Loud Clock
3
Problems
4
Alerts
5
Is everything wrong
6
I got paged
7
Im not shy
8
The madness phase
9
SLeyes
10
SLos
11
Dashboards
12
Reliability
13
Availability
14
Valuebased conversation
15
Accountability
16
Sharing knowledge
17
Getting developers on board
18
Business side of the house
19
Turning it around
20
What didnt go so well
21
Conclusion
22
Questions
Description:
Explore a 26-minute conference talk from SREcon19 Europe/Middle East/Africa that chronicles one professional's journey as a solo Site Reliability Engineer (SRE). Discover how Brian Murphy transformed his organization's engineering culture after a challenging year in 2015. Learn about the implementation of SRE practices, including the introduction of Service Level Indicators (SLIs), reduction of Mean Time to Acknowledge (MTTA) and Mean Time to Resolve (MTTR), and improvement of release cadence. Gain valuable insights and practical advice on enhancing both personal and organizational performance in the field of SRE. The talk covers various aspects such as dealing with alerts, creating dashboards, focusing on reliability and availability, fostering value-based conversations, promoting accountability, and sharing knowledge. Understand the challenges faced, strategies employed, and lessons learned in turning around a struggling engineering organization through the adoption of SRE principles. Read more

My Life as a Solo SRE

USENIX
Add to list
0:00 / 0:00