How do you troubleshoot high memory usage causing OOMKilled events in production?

Question

Accepted Answer

When a container exceeds its memory limit, the kernel OOM killer terminates it and Kubernetes logs OOMKilled. Steps to resolve:Identify: kubectl describe pod <pod> — look for Reason: OOMKilled in Last State.Profile: Use kubectl top pod or Prometheus/Grafana to understand actual memory usage patterns.Fix: Either increase limits if the app genuinely needs more memory, or find and fix the memory leak in the application code.Prevent: Set up PrometheusRule or Datadog alerts to notify before a pod hits its limit.

How do you troubleshoot high memory usage causing OOMKilled events in production?

Practice Similar Questions