Kubernetes Interview Questions

Master Kubernetes with these real-world interview questions and answers.

Switch Topic:

Beginner Questions

Core concepts, syntax, and foundational command-line knowledge.

Easy Associate Level Kubernetes

What is a ConfigMap and when would you use it over an environment variable?

A ConfigMap stores non-sensitive configuration data as key-value pairs. It decouples your configuration from your container image.

Use ConfigMaps over hardcoded env vars when:

Config needs to differ between environments (dev/staging/prod)
Multiple pods share the same configuration
You need to mount config as a file (e.g., nginx.conf, prometheus.yml)

For sensitive data like passwords, use a Secret instead of a ConfigMap.

Intermediate Questions

Infrastructure management, deployment strategies, and delivery flows.

Medium Senior Level Kubernetes

What is the difference between a PersistentVolume and a PersistentVolumeClaim in Kubernetes?

PersistentVolumes (PV) and PersistentVolumeClaims (PVC) are Kubernetes abstractions for managing storage.

PersistentVolume (PV)

A PersistentVolume is a storage resource in the cluster provisioned by an administrator or dynamically created via StorageClass. It exists independently of any pod lifecycle.

Key properties:

Capacity: Size of the storage
Access Modes: ReadWriteOnce, ReadOnlyMany, ReadWriteMany
Reclaim Policy: Retain, Recycle, or Delete
StorageClass: Defines the provisioner

PersistentVolumeClaim (PVC)

A PVC is a request for storage by a user/application. It consumes PV resources similar to how pods consume node resources.

Example PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: standard

Binding Process

Admin creates PV (or StorageClass enables dynamic provisioning)
Developer creates PVC with storage requirements
Kubernetes binds PVC to a matching PV
Pod references the PVC as a volume

Dynamic Provisioning

With StorageClass, PVs are created automatically when a PVC is submitted, eliminating manual PV creation. This is the preferred approach in cloud environments (EBS, GCP PD, Azure Disk).

Medium Senior Level Kubernetes

What are Taints and Tolerations in Kubernetes and how do they control pod scheduling?

Taints and Tolerations are Kubernetes mechanisms that control which pods can be scheduled on which nodes.

What are Taints?

A taint is applied to a node and repels pods that do not have a matching toleration. Taints have three effects:

NoSchedule: Pod will not be scheduled on the node
PreferNoSchedule: Kubernetes tries to avoid scheduling the pod on the node
NoExecute: Pod is evicted if already running and not tolerating the taint

Example taint command:

kubectl taint nodes node1 key=value:NoSchedule

What are Tolerations?

Tolerations are applied to pods and allow the scheduler to place pods on nodes with matching taints.

Example toleration in a pod spec:

tolerations:
- key: "key"
  operator: "Equal"
  value: "value"
  effect: "NoSchedule"

Common Use Cases

Dedicated nodes: Taint GPU nodes so only GPU workloads run on them
Node maintenance: Taint nodes before draining to prevent new pod scheduling
Special hardware: Reserve nodes with SSDs or high memory for specific workloads
Multi-tenancy: Isolate team workloads on specific nodes

Key Difference from Node Affinity

Node Affinity attracts pods to nodes, while Taints repel pods from nodes. They complement each other for fine-grained scheduling control.

Medium Senior Level Kubernetes

Explain the difference between a Liveness probe, Readiness probe, and Startup probe.

Liveness Probe: Checks if the container is alive. If it fails, Kubernetes restarts the container. Use this to recover from deadlocks.

Readiness Probe: Checks if the container is ready to serve traffic. If it fails, the pod is removed from Service endpoints (no traffic sent). Use this during slow startup or when temporarily overloaded.

Startup Probe: Only runs at startup. Allows slow-starting containers enough time to initialize before liveness checks begin. Prevents liveness probes from killing a pod that is simply starting up slowly.

Medium Senior Level Kubernetes

What is a Kubernetes Ingress and how does it differ from a Service?

A Service exposes a set of pods internally or as a simple LoadBalancer. An Ingress is a Layer-7 (HTTP/HTTPS) routing rule that sits in front of multiple services and routes traffic based on hostname or path.

Example: Route api.example.com to the api-service and example.com to the frontend-service using a single load balancer IP. This is far more cost-effective than having a separate LoadBalancer service for each microservice.

Advanced Questions

Enterprise orchestration, deep architectural concepts, and scaling issues.

Hard Lead / Architect Level Kubernetes

Explain Kubernetes RBAC and how you would give a service account read-only access to pods.

RBAC (Role-Based Access Control) is the authorization mechanism in Kubernetes. It uses three objects:

Role/ClusterRole: Defines what actions are allowed on which resources.
ServiceAccount: An identity for pods or external tools.
RoleBinding/ClusterRoleBinding: Links a ServiceAccount to a Role.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
subjects:
- kind: ServiceAccount
  name: my-service-account
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Real Production Scenarios

Real-world architecture, system migration, and design challenges.

Medium Senior Level Kubernetes

What is a Kubernetes Operator and when should you build one?

A Kubernetes Operator is a method of packaging, deploying, and managing a Kubernetes application using custom resources and controllers that encode operational domain knowledge.

The Operator Pattern

Operators extend Kubernetes to automate the management of complex stateful applications. They use Custom Resource Definitions (CRDs) to define new resource types and a controller to watch those resources and reconcile the actual state with the desired state.

How Operators Work

Define a CRD (e.g., PostgresCluster)
User creates a CR (Custom Resource) instance
The Operator’s controller detects the new CR
Controller takes actions to create/configure the application
Controller continuously monitors and reconciles state

Real-World Operator Examples

Prometheus Operator: Manages Prometheus, Alertmanager, and related monitoring components
cert-manager: Automates TLS certificate provisioning and renewal
Strimzi: Manages Apache Kafka clusters on Kubernetes
CloudNativePG: Manages PostgreSQL clusters
ArgoCD: GitOps continuous delivery tool with its own CRDs

When to Build an Operator

Build an Operator when:

Your application has complex operational knowledge (e.g., database failover, backup/restore)
You need to manage stateful workloads with domain-specific logic
You want to automate Day-2 operations (upgrades, scaling, recovery)
Standard Kubernetes primitives are insufficient

When NOT to Build an Operator

Stateless applications that Deployments handle well
Simple configuration management (use ConfigMaps/Helm)
When an existing operator already solves your problem

Operator Development Tools

Operator SDK: From Red Hat, supports Go, Ansible, and Helm operators
Kubebuilder: CNCF framework for building operators in Go
Metacontroller: Simplifies operator development with webhooks

Medium Senior Level Kubernetes

What is etcd and what role does it play in a Kubernetes cluster?

etcd is a distributed, reliable key-value store that serves as Kubernetes’ primary datastore for all cluster state and configuration data.

Role in Kubernetes

etcd is the single source of truth for a Kubernetes cluster. Every object you create (pods, services, configmaps, secrets, etc.) is stored in etcd. The API server reads and writes to etcd for all cluster state.

Key Characteristics

Distributed Consensus

etcd uses the Raft consensus algorithm to ensure data consistency across multiple etcd instances. A cluster typically runs 3 or 5 etcd nodes to achieve fault tolerance.

Watch Mechanism

Kubernetes controllers use etcd’s watch API to get notified of changes. For example, the scheduler watches for unscheduled pods and the controller manager watches for deployment changes.

Strong Consistency

etcd provides linearizable reads and writes, ensuring all clients see the same data at the same time.

What’s Stored in etcd

All Kubernetes objects (Pods, Deployments, Services, etc.)
Cluster configuration
RBAC policies
Secrets (encrypted at rest if configured)
Node information

etcd in Production

Backup Strategy

# Create etcd snapshot
ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

High Availability

Run odd number of nodes (3, 5, 7)
A cluster of 3 tolerates 1 failure
A cluster of 5 tolerates 2 failures
Use dedicated SSDs for low latency

Why etcd Performance Matters

etcd latency directly impacts API server response time. Slow etcd = slow kubectl, slow deployments, and cluster instability. Always monitor etcd disk I/O and latency metrics.

Medium Senior Level Kubernetes

What is the difference between a Kubernetes Job and a CronJob?

Kubernetes Jobs and CronJobs are workload resources for running tasks to completion rather than running continuously like Deployments.

Kubernetes Job

A Job creates one or more pods and ensures a specified number of them successfully terminate. Once the required completions are reached, the Job is complete.

Use Cases

Database migrations
Batch data processing
One-time setup tasks
Report generation

Example Job

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  completions: 1
  parallelism: 1
  backoffLimit: 3
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: migrate
        image: myapp:latest
        command: ["python", "manage.py", "migrate"]

Job Patterns

Non-parallel: Single pod runs to completion
Parallel with fixed count: Multiple pods, each does a portion
Parallel with work queue: Pods process items from a queue

CronJob

A CronJob creates Jobs on a repeating schedule using standard Unix cron syntax.

Use Cases

Nightly database backups
Hourly report generation
Periodic cleanup tasks
Scheduled ETL pipelines

Example CronJob

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-backup
spec:
  schedule: "0 2 * * *"  # 2 AM every day
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: backup-tool:latest
            command: ["/backup.sh"]

Key Differences

Feature	Job	CronJob
Trigger	Manual/one-time	Scheduled (cron)
Recurrence	Runs once	Repeats on schedule
Use case	Ad-hoc tasks	Recurring tasks
Creates	Pods directly	Jobs (which create pods)

Medium Senior Level Kubernetes

What are Init Containers in Kubernetes and what problems do they solve?

Init Containers are specialized containers that run and complete before the main application containers start in a pod.

How Init Containers Work

Init containers run sequentially – each must complete successfully before the next one starts, and all must succeed before the app containers start. If an init container fails, Kubernetes retries according to the pod’s restart policy.

Problems They Solve

1. Dependency Waiting

Wait for a service to be ready before the app starts:

initContainers:
- name: wait-for-db
  image: busybox
  command: ['sh', '-c', 'until nc -z db-service 5432; do sleep 2; done']

2. Pre-initialization Tasks

Clone a Git repository into a shared volume
Download configuration files from a remote source
Run database migrations before the app starts

3. Security Isolation

Run privileged setup tasks in an init container while the main container runs with minimal privileges.

4. Delay App Start

Wait for custom resources or CRDs to be registered before the app that uses them starts.

Init vs Sidecar Containers

Feature	Init Container	Sidecar Container
Lifecycle	Runs once and exits	Runs alongside main
Purpose	Setup/preparation	Supporting services
Parallel	Sequential	Parallel with main

Example

spec:
  initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'until nslookup myservice; do sleep 2; done']
  containers:
  - name: myapp
    image: myapp:latest

Medium Senior Level Kubernetes

What is a DaemonSet in Kubernetes and when would you use it?

A DaemonSet ensures that a copy of a pod runs on all (or specific) nodes in a Kubernetes cluster. When nodes are added to the cluster, the DaemonSet automatically schedules a pod on them.

How DaemonSets Work

Unlike Deployments which control a specific number of replicas, DaemonSets ensure one pod per matching node. When a node is removed, the pod is garbage collected.

Common Use Cases

Log collection agents: Fluentd, Filebeat – collect logs from every node
Monitoring agents: Prometheus Node Exporter, Datadog Agent – collect node metrics
Network plugins: CNI plugins like Calico, Flannel run as DaemonSets
Storage drivers: Ceph, GlusterFS storage daemons
Security agents: Falco, Sysdig for runtime security monitoring

Example DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: node-exporter
        image: prom/node-exporter:latest
        ports:
        - containerPort: 9100

DaemonSet vs Deployment

Feature	DaemonSet	Deployment
Replicas	1 per node	Fixed count
Scaling	Auto with nodes	Manual/HPA
Use case	Node-level services	Stateless apps

Node Selection

Use nodeSelector or nodeAffinity to restrict a DaemonSet to specific nodes (e.g., only GPU nodes, only Linux nodes).

Medium Senior Level Kubernetes

What is Helm and how does it simplify Kubernetes application deployment?

Helm is the package manager for Kubernetes, making it easy to define, install, and upgrade complex Kubernetes applications.

Core Concepts

Charts

A Helm chart is a collection of files that describe Kubernetes resources. It contains:

templates/: Kubernetes manifests with Go template syntax
values.yaml: Default configuration values
Chart.yaml: Chart metadata (name, version, description)
charts/: Dependencies

Releases

When a chart is installed, a release is created. Multiple releases of the same chart can run in the same cluster with different configurations.

Repositories

Charts are stored in and shared via Helm repositories (e.g., ArtifactHub, Bitnami).

Common Commands

# Add a repository
helm repo add bitnami https://charts.bitnami.com/bitnami

# Search charts
helm search repo nginx

# Install a chart
helm install my-nginx bitnami/nginx -f custom-values.yaml

# Upgrade a release
helm upgrade my-nginx bitnami/nginx --set replicas=3

# Rollback
helm rollback my-nginx 1

# List releases
helm list -A

Benefits

Templating: Reuse manifests with different values per environment
Version management: Track chart versions and rollback easily
Dependency management: Bundle related charts together
Release lifecycle: Install, upgrade, rollback, uninstall with single commands

Helm 3 vs Helm 2

Helm 3 removed Tiller (the server-side component), making it more secure by using Kubernetes RBAC directly and storing release state as Kubernetes Secrets.

Medium Senior Level Kubernetes

What is the purpose of a PodDisruptionBudget (PDB) in Kubernetes?

A PodDisruptionBudget limits how many pods of a deployment can be unavailable simultaneously during voluntary disruptions like node drains, cluster upgrades, or scaling down.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api

Without a PDB, a cluster upgrade could drain multiple nodes simultaneously and take down your entire service. With minAvailable: 2, Kubernetes ensures at least 2 pods are always running.

Hard Lead / Architect Level Kubernetes

Explain Kubernetes network policies and how you would isolate a production namespace.

By default, all pods in a Kubernetes cluster can communicate with each other freely. NetworkPolicies are namespace-scoped firewall rules that control which pods can talk to which.

To enforce full isolation on a namespace, start by denying all ingress and egress, then selectively allow only what’s needed:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Then add specific allow rules for your database, monitoring agents, and DNS (port 53).

Medium Senior Level Kubernetes

How does the Kubernetes Horizontal Pod Autoscaler (HPA) work?

HPA automatically scales the number of pod replicas based on observed metrics. The default metric is CPU utilization, but it also supports memory and custom metrics via the Metrics API.

kubectl autoscale deployment my-app --min=2 --max=10 --cpu-percent=60

The HPA controller checks metrics every 15 seconds (default) and adjusts replicas to maintain the target. For custom metrics, you can integrate tools like KEDA (Kubernetes Event-Driven Autoscaling) which can scale based on Kafka lag, SQS queue depth, and more.

Hard Lead / Architect Level Kubernetes

How do you manage secrets securely in Kubernetes? What are the alternatives to plain Kubernetes Secrets?

Kubernetes Secrets are base64-encoded, not encrypted by default. For production, consider these approaches:

Encryption at Rest: Enable EncryptionConfiguration to encrypt secrets in etcd.
External Secrets Operator: Syncs secrets from AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault into Kubernetes Secrets automatically.
HashiCorp Vault Agent Injector: Injects secrets directly into Pod filesystems without storing them in Kubernetes at all.
Sealed Secrets: Encrypts secrets client-side so they are safe to commit to Git.

Medium Senior Level Kubernetes

How do services in different namespaces communicate in Kubernetes?

All services in a Kubernetes cluster are reachable via DNS using the Fully Qualified Domain Name (FQDN):

<service-name>.<namespace>.svc.cluster.local

For example, a service named postgres in the production namespace is reachable at postgres.production.svc.cluster.local from any pod in any namespace. If NetworkPolicies are in place, you must explicitly allow cross-namespace traffic.

Medium Senior Level Kubernetes

What is the difference between a StatefulSet and a Deployment?

Use a Deployment for stateless workloads (web servers, APIs) where any Pod is interchangeable. Use a StatefulSet for stateful workloads like databases that need:

Stable, predictable network identities (pod-0, pod-1, etc.)
Ordered, graceful deployment and scaling
Stable persistent storage linked to each pod individually

Common examples: Kafka, ZooKeeper, Cassandra, PostgreSQL replicas.

Easy Associate Level Kubernetes

What is the difference between a Pod and a Deployment in Kubernetes?

A Pod is the smallest deployable unit in Kubernetes — it wraps one or more containers that share the same network and storage. However, Pods on their own are ephemeral.

A Deployment is a higher-level abstraction that manages Pods. It ensures a specified number of Pod replicas are running at all times, handles rolling updates, and allows rollbacks. You almost never create bare Pods in production; you use Deployments instead.

kubectl create deployment nginx --image=nginx:1.25 --replicas=3

Medium Senior Level Kubernetes

How do you perform a zero-downtime rolling update in Kubernetes?

Kubernetes Deployments support RollingUpdate strategy by default. The key is configuring maxSurge and maxUnavailable correctly alongside working readiness probes.

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 0

With maxUnavailable: 0, Kubernetes will never take down an old Pod until the new one is healthy (as determined by its readiness probe). This guarantees zero downtime.

Easy Associate Level Kubernetes

Explain the role of ‘Sidecar’ containers in Kubernetes pod architecture.

A sidecar container is a secondary container that runs along with the main application container within the same pod. It is used to extend and enhance the functionality of the main container, such as by providing logging, monitoring, or proxy services.

Medium Senior Level Kubernetes

What is a ‘StatefulSet’ and when should you use it over a ‘Deployment’ in Kubernetes?

A StatefulSet is used for stateful applications that require unique, persistent identities and stable network identifiers. Unlike Deployments, which are for stateless pods, StatefulSets manage pods that are not interchangeable and have sticky identities.

Hard Lead / Architect Level Kubernetes

How do you implement Zero-Downtime deployments with Kubernetes Service objects?

Discuss RollingUpdate strategies, readiness probes, and the role of Service selectors in traffic routing during a rollout.

Troubleshooting Scenarios

Live system debugging, incident diagnostics, and latency resolution.

Hard Lead / Architect Level Kubernetes

How do you troubleshoot high memory usage causing OOMKilled events in production?

When a container exceeds its memory limit, the kernel OOM killer terminates it and Kubernetes logs OOMKilled. Steps to resolve:

Identify: kubectl describe pod <pod> — look for Reason: OOMKilled in Last State.
Profile: Use kubectl top pod or Prometheus/Grafana to understand actual memory usage patterns.
Fix: Either increase limits if the app genuinely needs more memory, or find and fix the memory leak in the application code.
Prevent: Set up PrometheusRule or Datadog alerts to notify before a pod hits its limit.

Easy Associate Level Kubernetes

What are resource requests and limits in Kubernetes, and why are they important?

Requests tell the Kubernetes scheduler how much CPU/memory to reserve for a pod when scheduling it onto a node. Limits are the hard caps — the container is throttled (CPU) or killed (memory) if it exceeds them.

resources:
  requests:
    memory: "128Mi"
    cpu: "250m"
  limits:
    memory: "256Mi"
    cpu: "500m"

Always set both. Without requests, the scheduler cannot make good placement decisions. Without limits, a runaway container can starve other workloads on the same node (the “noisy neighbor” problem).

Hard Lead / Architect Level Kubernetes

How do you debug a pod stuck in CrashLoopBackOff?

CrashLoopBackOff means the container starts but repeatedly crashes. Use this systematic approach:

Check logs: kubectl logs <pod> --previous to see the crash output.
Describe the pod: kubectl describe pod <pod> to inspect Events, resource limits, and probe failures.
Check OOM: If you see OOMKilled, the container exceeded its memory limit.
Shell override: Override the entrypoint to keep the container alive for inspection: command: ["sleep", "3600"]

My Practice Workspace

No saved questions yet. Click the Save button on any question to save it here.

No recently viewed questions.