Explore a comprehensive talk from the Strange Loop Conference on building a scalable anomaly detection system without using machine learning. Dive into Netflix's approach to detecting and pinpointing failures in their complex cloud architecture, composed of thousands of services and hundreds of thousands of VMs and containers. Learn how Zuul, Netflix's front-door for all cloud traffic, is leveraged to stream real-time events and identify broken paths in their microservices maze. Discover the innovative use of stream processing, anomaly detection algorithms, and a rules engine to create an efficient system capable of handling millions of requests across thousands of nodes. Gain insights into the benefits of using "old-fashioned math" over machine learning in certain scenarios, and understand the implementation of dynamic and adaptive thresholds. Examine the anomaly detection algorithm in-depth, including median estimation, MAD, and recovery detection. Explore the impact assessment process, data visualization techniques, and the use of Spinnaker events for more accurate problem identification. Understand how this system provides real-time alerting, reduces operational burden, and improves accuracy in detecting service issues within Netflix's complex microservices architecture.
Read more
Scalable Anomaly Detection - With Zero Machine Learning