Главная
Study mode:
on
1
Intro
2
Datadog
3
Symptoms
4
Investigation
5
Deletion call, 4d before Audit logs for the namespace
6
Spinnaker deploys (v1)
7
Helm 3 deploys (v2)
8
Big difference
9
What happened?
10
Namespace Controller logs Virtual
11
Events so far
12
Metrics-server setup
13
Metrics-server deployment
14
Full chain of events
15
Key take-away Apiservice extensions are great but can impact your cluster
16
Context
17
Runtime is down?
18
CNI status
19
Containerd goroutine dump Blocked goroutines?
20
Seems CNI related
21
What about Delete?
22
CNI plugin
23
The root cause
24
What we know
25
Apiserver requests
26
Illustration
27
What about label filters?
28
Informers instead of List How do informers work?
29
Back to the incident
30
Nodegroup controller?
31
How did it work?
32
What we learned
33
Conclusion
Description:
Explore complex Kubernetes debugging scenarios in this conference talk from KubeCon + CloudNativeCon Europe 2021. Delve into real-world examples from Datadog's journey of migrating workloads to Kubernetes, including an intriguing case where an OOM-killer invocation triggered namespace deletion. Learn about the intricate interactions between Kubernetes components, investigate symptoms, and uncover surprising root causes. Gain insights into managing large-scale clusters, understanding metrics-server setups, and dealing with CNI-related issues. Discover key takeaways about apiservice extensions, runtime troubleshooting, and the importance of informers in Kubernetes operations. Enhance your ability to diagnose and resolve unexpected challenges in Kubernetes environments.

How the OOM-Killer Deleted My Namespace, and Other Kubernetes Tales

CNCF [Cloud Native Computing Foundation]
Add to list