Play all

Introduction

Loud Clock

Problems

Alerts

Is everything wrong

I got paged

Im not shy

The madness phase

SLeyes

SLos

Dashboards

Reliability

Availability

Valuebased conversation

Accountability

Sharing knowledge

Getting developers on board

Business side of the house

Turning it around

What didnt go so well

Conclusion

Questions

Description:

Explore a 26-minute conference talk from SREcon19 Europe/Middle East/Africa that chronicles one professional's journey as a solo Site Reliability Engineer (SRE). Discover how Brian Murphy transformed his organization's engineering culture after a challenging year in 2015. Learn about the implementation of SRE practices, including the introduction of Service Level Indicators (SLIs), reduction of Mean Time to Acknowledge (MTTA) and Mean Time to Resolve (MTTR), and improvement of release cadence. Gain valuable insights and practical advice on enhancing both personal and organizational performance in the field of SRE. The talk covers various aspects such as dealing with alerts, creating dashboards, focusing on reliability and availability, fostering value-based conversations, promoting accountability, and sharing knowledge. Understand the challenges faced, strategies employed, and lessons learned in turning around a struggling engineering organization through the adoption of SRE principles. Read more

My Life as a Solo SRE

USENIX

Add to list

#Conference Talks #SREcon #Business #Management #Organizational Behavior #Organizational Culture #Computer Science #DevOps #Release Management #Information Technology #Incident Management #Service Level Indicators

0:00 / 0:00