Play all

Intro

Replicated State Machine (RSM)

Fault Tolerance for High Availability

Slowdowns Hurt Availability

Slowdowns Take Different Forms

Defining Slowdown Tolerance

Multi-Paxos is Not 1-Slowdown-Tolerant

Copilot: First 1-Slowdown-Tolerant Protocol

Ordering: Use Two Logs

Ordering: Combine Logs with Dependencies

Ordering: Dependency Cycles

Ordering: A Tricky Case

Ordering: Same on All Replicas

Copilot Protocol: Dependencies?

Optimizations

Evaluation

Copilot and Fast-View-Change Tolera

Gradual Slowdown

Performance Without Slow Replicas

Conclusion

Description:

Explore a groundbreaking approach to improving fault tolerance in replicated state machines through this 20-minute conference talk from OSDI '20. Dive into the Copilot replication protocol, the first 1-slowdown-tolerant consensus algorithm that maintains normal latency despite the slowdown of any single replica. Learn how Copilot utilizes two distinguished replicas, dependencies, deduplication, and fast takeovers to achieve superior performance in the face of slowdowns. Discover optimizations like ping-pong batching and null dependency elimination that enhance Copilot's efficiency. Compare Copilot's performance against Multi-Paxos and EPaxos, and understand how it uniquely maintains low latencies when a replica slows down. Gain insights into the protocol's design, implementation, and evaluation, making this talk essential for those interested in distributed systems, consensus algorithms, and high-availability architectures.

Tolerating Slowdowns in Replicated State Machines Using Copilots

USENIX

Add to list

#Conference Talks #OSDI (Operating Systems Design and Implementation) #Computer Science #Distributed Systems #Software Engineering #Fault Tolerance #Consensus Algorithms #Software Architecture #High Availability

0:00 / 0:00