Play all

Intro

Runtime checker (aka. detector/monitor)

Importance of runtime checker

Current checking practice

Complex internals of modern software

Common to exhibit gray failures

A real-world gray failure

Failure root cause

Ideal runtime checkers

A new approach

Panorama: capture in-situ observability

Convert a program into in-situ observer

Identify observation boundary and identities

Extract evidence

Example of analysis

Detecting real-world gray failures

Timeline of detecting failure case f1

Latency overhead to observers

Program reduction approach

Why doing reduction?

identify long-running regions

select checking target candidates

reduce long-running methods

encapsulate checkers

insert watchdog hooks

Prevent side effects

Watchdog generation

Failure detection evaluation setup

Detecting real-world failures

Silent semantic violations

Real-world failure study

Oathkeeper: detect silent semantic violation

How to express semantics?

Oathkeeper workflow

Emitting semantic event traces

General semantic rule templates

Extracted semantic rules

Runtime overhead

Conclusions

Description:

Explore three systematic techniques for automatically generating effective, customized runtime checkers for large distributed systems in this 40-minute Strange Loop Conference talk. Learn about Panorama's approach to capturing in-situ observability, a program reduction method for identifying long-running regions and inserting watchdog hooks, and Oathkeeper's strategy for detecting silent semantic violations. Discover how these techniques can help detect and localize unexpected subtle failures in complex production environments, improving the reliability and availability of modern distributed systems. Gain insights from real-world failure studies and performance evaluations presented by Ryan Huang, an Assistant Professor at Johns Hopkins University specializing in computer systems research.

Automatic Generation of Runtime Checkers for Production Distributed Systems

Strange Loop Conference

Add to list

#Conference Talks #Strange Loop Conference #Computer Science #Distributed Systems #Business #Performance Improvement