Play all

Intro

And the problem space is complex.

Write workload, trailing year

Read workload, trailing year

Service Level Objectives (SLO)

Data storage engine and analytics flow

SLOs are user flows

Service-Level Objectives

Functional and visual testing.

Design for feature flag deployment.

Automated integration & human review.

Green button merge.

Auto-updates, rollbacks, & pins.

Observe behavior in prod.

Non-trivial savings.

Three case studies of failure

1 Shepherd: ingest API service

Honeycomb Ingest Outage

Now what?

Kafka: data bus

Our month of Kafka pain

Unexpected constraints

Take care of your people

Optimize for safety

Retriever: query service

Making progress carefully

Takeaways

Acknowledge hidden risks

Make experimentation routine!

Understand & control production.

Description:

Explore the challenges of modern distributed systems engineering and the importance of observability in a 58-minute conference talk by Charity Majors, CTO at Honeycomb.io. Delve into the complexities of modern development paradigms and their impact on operational futures. Examine the limitations of traditional debugging tools in the face of increasingly complex systems. Discover potential disasters awaiting distributed systems engineers and learn how instrumentation and observability can help mitigate these risks. Gain insights into adapting tooling and organizational culture to keep pace with evolving technologies. Follow case studies of failure, including issues with ingest API services, Kafka data buses, and query services. Understand the importance of acknowledging hidden risks, making experimentation routine, and maintaining control over production environments. Benefit from Majors' extensive experience in systems engineering and database management at companies like Facebook, Parse, and Linden Lab. Read more

Observability and the Future of Complex Systems

ChariotSolutions

Add to list

#Computer Science #DevOps #Observability #Distributed Systems #Continuous Deployment #Service-Level Objectives