devopsinterview.com

Interactive Prep

Trending Interview Questions

Reveal quick answers, troubleshooting commands, and interactive flow diagrams instantly.

01
Inspect container exit codes: Look for OOMKilled (137) or system/app crashes (1, 128, etc.).
02
Run logs diagnostics: Run kubectl logs <pod> --previous to inspect stdout/stderr from the crashed process.
03
Describe events timeline: Run kubectl describe pod <pod> to see if failing liveness probes are triggering restarts.
04
Verify resources & env: Ensure configmaps, secrets, and environment variables are mounted and formatted correctly.

SRE Debug Shell -- devopsinterview.com

# 1. Check pod events & status
kubectl get pods
kubectl describe pod <pod-name>

# 2. Inspect logs of the crashed container instance
kubectl logs <pod-name> --previous --tail=50

# 3. Check node metrics for memory/CPU pressure
kubectl top pod <pod-name>

Pod Created

➔

Running

➔

Liveness Probe Fails

➔

CrashLoopBackOff

➔

Restart Delay

01
Persistent Network Identity: StatefulSet pods get sticky DNS hostnames (e.g. web-0.db), Deployments use random hashes.
02
Storage Binding: StatefulSet pairs each pod ordinal with its own PVC (e.g., data-web-0). Deployments share storage.
03
Scale/Rollout Ordering: StatefulSet handles rollouts sequentially (0 -> 1 -> 2). Deployments perform rolling updates in parallel.
04
Ideal workloads: Use StatefulSets for clustered databases (Postgres, Kafka). Use Deployments for stateless web services.

SRE Debug Shell -- devopsinterview.com

# 1. Inspect StatefulSet Pods (Note the sequence 0, 1, 2)
kubectl get statefulset db-node
kubectl get pods -l app=db-node

# 2. View persistent volume claims mapped per ordinal
kubectl get pvc -l app=db-node

StatefulSet DB

➔

Ordinal db-0 (pvc-0)

➔

Ordinal db-1 (pvc-1)

➔

Deployment App

➔

Stateless app-hash1

➔

Stateless app-hash2

01
Avoid Collisions: Locks state when a write command (apply, destroy) runs, blocking other operators from running concurrently.
02
Backend Lock Support: Supported by backend providers like AWS S3 (via DynamoDB table), Consul, Azure Blob, or HashiCorp Cloud.
03
Lock Schema: For S3, Terraform writes a LockID entry into DynamoDB containing execution details. Releases it on exit.
04
Emergency Force Unlock: If an apply crashes mid-run, release the state lock manually using terraform force-unlock <Lock-ID>.

SRE Debug Shell -- devopsinterview.com

# 1. Initialize backend state with locking
terraform init

# 2. Apply infrastructure changes (acquires state lock automatically)
terraform apply

# 3. Emergency manual lock release (use with caution)
terraform force-unlock 98b6a1f0-0b61-46e2-8921-987818e3810a

Apply Started

➔

Acquire LockID in DynamoDB

➔

Lock Held (Blocks Others)

➔

Write State Update

➔

Release LockID

01
Flat network topology: Every pod gets a unique cluster-routable IP. Pods talk to other pods directly without NAT using a CNI.
02
Service routing: Virtual ClusterIP services load balance requests across backend pods using kube-proxy iptables/IPVS rules.
03
Ingress Gateway: Ingress Controller (e.g. Nginx) intercepts external HTTP/S requests, matching hosts/paths to map to backend Services.
04
DNS resolution: CoreDNS resolves local hostnames like auth-service.production.svc.cluster.local to ClusterIPs.

SRE Debug Shell -- devopsinterview.com

# 1. Check CNI node communication endpoints
kubectl get pods -n kube-system -l k8s-app=calico-node

# 2. Inspect iptables rules configured for a service
iptables -t nat -L PREROUTING -n -v

# 3. Perform DNS query inside cluster container
kubectl exec -it alpine -- nslookup auth-service

Client Request

➔

Ingress Gateway

➔

Service ClusterIP

➔

Kube-Proxy Rules

➔

Pod IP (CNI)

01
Isolate the node: Use kubectl top nodes and kubectl top pods -A to identify which pod/node is eating capacity.
02
Node processes inspection: Execute SSM/SSH into the host node. Run htop or ps aux --sort=-%cpu to check processes.
03
Check throttling spikes: Query Prometheus/cAdvisor charts for CPU usage vs CPU limits. Throttling occurs if limit is too low.
04
App profile diagnostics: Extract runtime threads profile (e.g. pprof for Go, pprof tools) to diagnose lock contentions.

SRE Debug Shell -- devopsinterview.com

# 1. Show resource usage for nodes and pods
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=cpu

# 2. Inspect node capacity details
kubectl describe node <eks-worker-node-name>

# 3. Check CPU throttling metrics for a container
kubectl exec -it <pod-name> -- cat /sys/fs/cgroup/cpu/cpu.stat

CPU Spike Alert

➔

Identify Pod via kubectl top

➔

SSH SSM to Node Host

➔

Check cgroup limit stats

➔

Profile App Threads

Interview Preparation Roadmap

Follow a structured milestones timeline to progress from foundation concepts to elite SRE system design.

Phase 01

Foundation & OS Internals

Develop core understanding of the operating system, shell scripting, container execution runtime, and standard networking protocols.

Linux Processes
systemd & Init
Docker Engine
Bash Scripting
TCP/IP & DNS

Phase 02

CI/CD & Infrastructure Automation

Master declarative configuration management, provisioning infrastructure as code securely, and building repeatable build pipelines.

Terraform Modules
State Management
GitHub Actions
Secret Management
Ansible Playbooks

Phase 03

Kubernetes & Cloud Infrastructure

Build scalable container orchestration systems. Understand scheduler routing rules, networking patterns, ingress, storage mapping, and IAM integrations.

Pods & Deployments
Services & Ingress
StatefulSets
Kubernetes RBAC
AWS EKS/GKE Cluster

Phase 04

Observability & SRE Operations

Learn to instrument microservices, track metrics, aggregate application logs, and troubleshoot production incident anomalies.

Prometheus Metrics
Grafana Querying
Aggregated Logging
Alertmanager Rules
SLI / SLO Metrics

Phase 05

Advanced System Design

Architect large-scale, fault-tolerant distributed backends. Handle network partitions, databases replication bottlenecks, caching systems, and failover routing.

High Availability
Load Balancing
Distributed Caching
Database Replication
DR failovers

Crack DevOps Interviews with
Real Production Scenarios

My Practice Workspace

Featured Learning Tracks

Kubernetes

AWS

Docker

Terraform

Linux

CI/CD Pipelines

Observability

System Design

Production Incident Scenarios

Interview Preparation Roadmap

Foundation & OS Internals

CI/CD & Infrastructure Automation

Kubernetes & Cloud Infrastructure

Observability & SRE Operations

Advanced System Design

Built for Real DevOps Interviews

Real Production Scenarios

Troubleshooting-Focused

Senior SRE Preparation

Hands-on Code Auditing

System Design + DevOps

Accelerate Your DevOps Career Path

Crack DevOps Interviews withReal Production Scenarios

Trending Interview Questions

How would you troubleshoot CrashLoopBackOff?

Difference between StatefulSet and Deployment?

How does Terraform state locking work?

Explain Kubernetes networking flow.

How would you debug high CPU in EKS?

My Practice Workspace

Featured Learning Tracks

Kubernetes

AWS

Docker

Terraform

Linux

CI/CD Pipelines

Observability

System Design

Production Incident Scenarios

Interview Preparation Roadmap

Foundation & OS Internals

CI/CD & Infrastructure Automation

Kubernetes & Cloud Infrastructure

Observability & SRE Operations

Advanced System Design

Built for Real DevOps Interviews

Real Production Scenarios

Troubleshooting-Focused

Senior SRE Preparation

Hands-on Code Auditing

System Design + DevOps

Accelerate Your DevOps Career Path

Crack DevOps Interviews with
Real Production Scenarios