Play all

Intro

Metries service errors during rollouts

Applications involved

DNS setup

Too many queries at startup?

Networking issues?

Let's test with network optimized instances

What about bigger instances?

VPC Flow Logs

Zoom on ingress flows to old IP

What about egress?

Routing on nodes

Stable state

What about traffic to old IP?

Let's simulate

Reverse Path filtering

2 questions

RPC setup

DNS propagation time during Rollouts

Reconnection differences

Lessons Learned

Description:

Dive into a complex Kubernetes incident investigation in this conference talk. Follow the journey of troubleshooting mysterious service errors during rolling updates, initially suspected to be DNS-related. Explore the debugging steps, from analyzing application behavior and DNS setup to investigating networking issues and VPC flow logs. Uncover the intricacies of ingress and egress flows, routing on nodes, and the impact of reverse path filtering. Examine the RPC setup, DNS propagation time during rollouts, and reconnection differences. Learn valuable lessons from this in-depth exploration of a challenging issue that ultimately led to a simple three-line code removal solution.

Debugging Complex Kubernetes Incidents - When It's Not DNS

CNCF [Cloud Native Computing Foundation]

Add to list

#Computer Science #DevOps #Kubernetes #Programming #Web Development #Routing #Ingress #Cloud Computing #Cloud Networking #VPC Flow Logs

0:00 / 0:00