Debug Like an Operator, Then Choose Your Next Path

Clusters fail in predictable categories: scheduling, image pull, crash loop, configuration, networking. A short debugging order and a handful of commands will carry you through most beginner incidents on kind, k3s, or minikube—and the same order works in production.

In short

Use get → describe → logs → events. Fix ImagePullBackOff, CrashLoopBackOff, and probe failures with intent. Then deepen with GitOps, observability, and structured certification practice.

The debugging order (memorize this)

  1. kubectl get pods -n <ns> — phase and restarts at a glance.
  2. kubectl describe pod <name> -n <ns> — events, node, probes, volumes.
  3. kubectl logs <name> -n <ns> [--previous] — stdout/stderr; --previous for crashed containers.
  4. kubectl get events -n <ns> --sort-by='.lastTimestamp' — timeline when describe is noisy.

For Deployments add kubectl describe deployment and kubectl rollout status. For Services add kubectl get endpoints and verify selectors.

Common Pod states and what they mean

SymptomLikely causeWhat to do
PendingNo node capacity, taints, PVC not bounddescribe pod → Events; get nodes
ImagePullBackOffWrong name/tag, private registry authFix image; create secret docker-registry if private
CrashLoopBackOffApp exits on start, bad command, missing configlogs --previous; run image locally with same command
CreateContainerConfigErrorMissing Secret/ConfigMap keydescribe pod; verify referenced objects exist
Running but not ReadyReadiness probe failingCheck probe path/port; test inside Pod with kubectl exec

Interactive debugging

kubectl exec -it deploy/web -n learn-dev -- sh
# inside: wget -qO- http://127.0.0.1/   OR apk/curl depending on image

kubectl run tmp-curl --rm -it --image=curlimages/curl --restart=Never -- \
  curl -s http://web.learn-dev.svc.cluster.local/

DNS names follow <service>.<namespace>.svc.cluster.local. If curl from another Pod fails, you have a Service or NetworkPolicy problem—not “the internet is down.”

When port-forward works but Ingress does not

Ingress needs an ingress controller (nginx, traefik, etc.). Local clusters often do not install one by default. For learning, port-forward and minikube tunnel (when documented for your driver) are enough. Treat Ingress as a follow-on topic after Services make sense.

Observability: the next layer

  • Metrics: kubectl top pods (requires metrics-server on many clusters)—see metrics-server in depth.
  • Dashboards: Prometheus/Grafana in a later lab—not required on day one.
  • Tracing: OpenTelemetry when you operate microservices at scale.

The architecture post’s incident mental model—API → nodes → schedule → container → network—still applies; metrics tell you where in that chain to look.

Where to go after this series

Series recap

  1. Local lab — kind, k3s, or minikube.
  2. YAML anatomy — apiVersion, kind, metadata, spec, labels.
  3. First workloads — Deployment and Service.
  4. Day-one practices — namespaces, labels, resources, security.
  5. This post — debug and roadmap.

You now have a loop: declare in YAML → apply → observe → fix → commit. That loop is the job—whether the cluster lives on your laptop or in three regions.

← Part 4 · Blog index · Start over at Part 1

Part 4 Back to Part 1