Background: Quick 1 Introduction of Linkedin Stack
3
Linkedin Stack Under the hood
4
Finding Needle in a haystack
5
Alert Correlation A framework that automates the alert correlation process to identity unhealthy microservices
6
Alert Correlation Slack Recommendations
7
A Real Issue
8
A Spike
9
Correlation does not mean Causation
10
Problem Statement: Finding the "right" needle in a needlestack
11
Modified Z-Score For Outlier Detection
12
MAD (Median Absolute Deviation)
13
A Simple Example
14
Spike Detection Challenges
15
Results: Spike vs Real
Description:
Explore spike detection techniques for alert correlation in large-scale microservices architectures through this SREcon21 conference talk. Delve into LinkedIn's approach to identifying root causes during production outages amidst thousands of interconnected services. Learn about the challenges of distinguishing genuine issues from false positives in a complex alert landscape. Discover how LinkedIn implemented anomaly detection using Modified Z-Score and Median Absolute Deviation (MAD) to streamline their alert correlation system. Gain insights into practical applications, challenges faced, and results achieved in reducing false escalations and minimizing issue resolution time. Understand the nuances of correlation versus causation in the context of microservices monitoring and troubleshooting.