Главная
Study mode:
on
1
Intro
2
Production is increasingly complex.
3
We're adding complexity all the time.
4
Our strategies need to evolve.
5
When we order the alphabet soup...
6
Noisy alerts. Grumpy engineers.
7
Walls of meaningless dashboards.
8
Tools aren't magical.
9
Invest in people, culture, & process.
10
Eliminate (unnecessary) complexity.
11
Our systems are always failing.
12
We need Service Level Indicators
13
What threshold buckets events?
14
HTTP Code 200? Latency 100ms?
15
Set a target Service Level Objective.
16
Use a window and target percentage.
17
Data-driven business decisions.
18
Failure modes can't be predicted.
19
Support debugging novel cases. In production.
20
Allow forming & testing hypotheses.
21
Can you explain the variance?
22
Observability isn't just the data.
23
Debugging is not a solo activity.
24
Debugging is for everyone.
25
Collaboration is interpersonal.
26
Lean on your team.
27
Fix hero culture. Share knowledge.
28
Use the same platforms & tools.
29
Reward curiosity and teamwork.
30
Risk analysis helps us plan.
31
Quantify risks by frequency & impact.
32
And prioritize completing the work.
33
Don't waste time chrome polishing.
34
Lack of observability is systemic risk.
35
So is lack of collaboration.
36
A dozen engineers build Honeycomb.
37
We make systems humane to run
38
Yes, we deploy on Fridays.
Description:
Explore a comprehensive conference talk on cultivating production excellence in complex distributed systems. Learn about essential practices for improving production environments, including fostering stakeholder involvement, enhancing observability through collaboration, implementing Service Level Objectives for measurement, and utilizing risk analysis for prioritizing improvements. Discover strategies to evolve your approach to managing increasingly complex systems, address common challenges like noisy alerts and meaningless dashboards, and shift focus towards investing in people, culture, and processes. Gain insights on setting effective Service Level Indicators, debugging novel cases in production, promoting collaborative debugging, addressing hero culture, and quantifying risks for better planning. Understand how these practices can lead to more humane and efficient system management, even allowing for confident Friday deployments.

Cultivating Production Excellence

NDC Conferences
Add to list
0:00 / 0:00