Главная
Study mode:
on
1
Intro
2
Metries service errors during rollouts
3
Applications involved
4
DNS setup
5
Too many queries at startup?
6
Networking issues?
7
Let's test with network optimized instances
8
What about bigger instances?
9
VPC Flow Logs
10
Zoom on ingress flows to old IP
11
What about egress?
12
Routing on nodes
13
Stable state
14
What about traffic to old IP?
15
Let's simulate
16
Reverse Path filtering
17
2 questions
18
RPC setup
19
DNS propagation time during Rollouts
20
Reconnection differences
21
Lessons Learned
Description:
Dive into a complex Kubernetes incident investigation in this conference talk. Follow the journey of troubleshooting mysterious service errors during rolling updates, initially suspected to be DNS-related. Explore the debugging steps, from analyzing application behavior and DNS setup to investigating networking issues and VPC flow logs. Uncover the intricacies of ingress and egress flows, routing on nodes, and the impact of reverse path filtering. Examine the RPC setup, DNS propagation time during rollouts, and reconnection differences. Learn valuable lessons from this in-depth exploration of a challenging issue that ultimately led to a simple three-line code removal solution.

Debugging Complex Kubernetes Incidents - When It's Not DNS

CNCF [Cloud Native Computing Foundation]
Add to list
0:00 / 0:00