Play all

Introduction

Agenda

Storytime

Data Evolution

Scaling

Cloud Computing

Why Scale Horizontally

What Does It Mean To Run A Distributed System

A Node On Distributed Computing

Summary

Shared Nothing Architecture

Unreliable Message Delivery

Why Are We Fenced Off

Building Observability

What We Can Know

The Cap Theorem

Replication Lag

Consistency is a Spectrum

Availability is Not Binary

Partition Tolerance

Hardware

Hardware Failure

Cables

Sharks

Kevlar

Network Partitions

Resource Isolation

Process Suspension

Network Glitch

People do bad things

Why does this matter

Practical reality

The correctness result

Mitigation strategies

Consensus Algorithms

The Woods Theorem

Building Mental Models

Incident Analysis

Blameless Discussions

Mental Models

Human Failure

Alert Fatigue

User Mindsets

Designing Systems for Humans

HugOps

Description:

Explore the complexities of distributed systems in this 33-minute conference talk from LISA19. Delve into the history of distributed computing, debunk common myths about the CAP theorem, and understand why network partitions are inevitable. Examine popular consensus algorithms and their role in mitigating risks associated with distributed operations. Learn how to design systems that account for human factors, enhancing adaptability and reducing the impact of programmatic uncertainty. Gain insights into data evolution, scaling challenges, cloud computing, and the concept of shared nothing architecture. Investigate the intricacies of unreliable message delivery, building observability, and the practical realities of hardware failures. Discover strategies for incident analysis, blameless discussions, and designing systems that prioritize human interaction and understanding.

Why Are Distributed Systems So Hard?

USENIX

Add to list

#Conference Talks #LISA (Large Installation System Administration) Conference #Computer Science #Distributed Systems #Consensus Algorithms #CAP Theorem #DevOps #Observability