Главная
Study mode:
on
1
Intro
2
Agenda
3
Principles
4
Context
5
Operations at Zalando
6
Alerting
7
Dashboards
8
Observability
9
SLOs
10
Incident process
11
WORMs
12
Summary
13
Outro
Description:
Explore Zalando's approach to reliability engineering in this comprehensive conference talk from GOTO Amsterdam 2024. Dive into best practices for achieving high reliability at scale, from stand-alone applications to company-wide systems. Learn about instrumentation, monitoring, alerting, tracing, and incident management techniques. Discover how Zalando manages reliability across 3000+ applications and 2000+ engineers to serve 50M+ customers in 23 countries. Gain insights on effective technologies and processes like WORM Cascades and Risk Management for steering reliability at the enterprise level. Follow along as Heinrich Hartmann, Head of Reliability Engineering at Zalando, shares valuable lessons on balancing technological and human factors in building robust, scalable systems.

A Field Guide to Reliability Engineering at Zalando - From Small to Large Scale

GOTO Conferences
Add to list