Distributed Systems in Production Jeff Hodges 2014-04
3
Why you should listen to me
4
Why you shouldn't listen to me
5
Scale-invariant
6
Building and running Distributed Systems
7
Quick foundation
8
What Makes Distributed Systems Different
9
Garbage collection spiral on a single machine causes requests to timeout • A process is overloaded, so too many clients get stuck trying to connect to it, so it gets slower • Socket write succeeds lo…
10
Partial Failure
11
"It's slow" is the hardest problem you'll ever debug
12
Metrics are the only way to get your job done.
13
On profiling
14
Deploys should change a metric
15
Logs are liars
16
Avoid coordination
17
If your problem fits in memory, it's probably trivial
18
Back-pressure
19
Dropping new messages on the floor • Returning documented overload errors until the system clears • Timeouts and exponential back-offs
20
Create partial availability
21
Search
22
Who to Follow in the monorail
23
Consider a private messaging database
24
Separating deploy from release
25
Roll out infrastructure with feature flags
26
Slow, dark rollouts
27
Multiple versions are the norm
28
Exploit data-locality
29
Extract services
30
Stricter boundaries means even less cheating
31
Pulling out a service makes deploys easier
32
Avoids human coordination costs that libraries require.
33
SOA through standardization
34
On-call rotations
35
The Notorious E.O.C.
36
Increasing the size of my thought leadership
37
Robust distributed systems cost more than undistributed systems.
38
Robust open source distributed systems are less common
39
Collaboration is politics
Description:
Explore tactics and strategies for productionizing distributed systems in this comprehensive conference talk. Delve into the challenges and solutions for building and running distributed systems at scale, covering topics like partial failure, metrics, profiling, and deployment strategies. Learn about the importance of back-pressure, partial availability, and data locality in system design. Discover techniques for extracting services, implementing service-oriented architecture, and managing on-call rotations. Gain insights into the future of distributed systems and the costs associated with robust implementations. Understand the political aspects of collaboration in distributed systems development and the scarcity of robust open-source solutions.
Distributed Systems in Production: Tactics and Strategy - Lecture 32