Explore three systematic techniques for automatically generating effective, customized runtime checkers for large distributed systems in this 40-minute Strange Loop Conference talk. Learn about Panorama's approach to capturing in-situ observability, a program reduction method for identifying long-running regions and inserting watchdog hooks, and Oathkeeper's strategy for detecting silent semantic violations. Discover how these techniques can help detect and localize unexpected subtle failures in complex production environments, improving the reliability and availability of modern distributed systems. Gain insights from real-world failure studies and performance evaluations presented by Ryan Huang, an Assistant Professor at Johns Hopkins University specializing in computer systems research.
Automatic Generation of Runtime Checkers for Production Distributed Systems