Debug Like an Operator, Then Choose Your Next Path

Clusters fail in predictable categories: scheduling, image pull, crash loop, configuration, networking. A short debugging order and a handful of commands will carry you through most beginner incidents on kind, k3s, or minikube—and the same order works in production.

In short

Use get → describe → logs → events. Fix ImagePullBackOff, CrashLoopBackOff, and probe failures with intent. Then deepen with GitOps, observability, and structured certification practice.

The debugging order (memorize this)

kubectl get pods -n <ns> — phase and restarts at a glance.
kubectl describe pod <name> -n <ns> — events, node, probes, volumes.
kubectl logs <name> -n <ns> [--previous] — stdout/stderr; --previous for crashed containers.
kubectl get events -n <ns> --sort-by='.lastTimestamp' — timeline when describe is noisy.

For Deployments add kubectl describe deployment and kubectl rollout status. For Services add kubectl get endpoints and verify selectors.

Common Pod states and what they mean

Symptom	Likely cause	What to do
`Pending`	No node capacity, taints, PVC not bound	`describe pod` → Events; `get nodes`
`ImagePullBackOff`	Wrong name/tag, private registry auth	Fix image; `create secret docker-registry` if private
`CrashLoopBackOff`	App exits on start, bad command, missing config	`logs --previous`; run image locally with same command
`CreateContainerConfigError`	Missing Secret/ConfigMap key	`describe pod`; verify referenced objects exist
Running but not Ready	Readiness probe failing	Check probe path/port; test inside Pod with `kubectl exec`

Interactive debugging

kubectl exec -it deploy/web -n learn-dev -- sh
# inside: wget -qO- http://127.0.0.1/   OR apk/curl depending on image

kubectl run tmp-curl --rm -it --image=curlimages/curl --restart=Never -- \
  curl -s http://web.learn-dev.svc.cluster.local/

DNS names follow <service>.<namespace>.svc.cluster.local. If curl from another Pod fails, you have a Service or NetworkPolicy problem—not “the internet is down.”

When port-forward works but Ingress does not

Ingress needs an ingress controller (nginx, traefik, etc.). Local clusters often do not install one by default. For learning, port-forward and minikube tunnel (when documented for your driver) are enough. Treat Ingress as a follow-on topic after Services make sense.

Observability: the next layer

Metrics: kubectl top pods (requires metrics-server on many clusters)—see metrics-server in depth.
Dashboards: Prometheus/Grafana in a later lab—not required on day one.
Tracing: OpenTelemetry when you operate microservices at scale.

The architecture post’s incident mental model—API → nodes → schedule → container → network—still applies; metrics tell you where in that chain to look.

Where to go after this series

Production playbook: Kubernetes troubleshooting playbook — symptom-by-symptom fixes for on-call.
GitOps: Git as the control plane — store manifests in Git, automate sync.
Platform context: DevOps life and business value, cloud platform evolution.
Structured practice: Kubernetes official tutorials; CKA/CKAD-style tasks (multi-object YAML under time pressure).
Production topics: Ingress, PersistentVolumes, StatefulSets, Helm/Kustomize, network policies, pod disruption budgets.

Series recap

Local lab — kind, k3s, or minikube.
YAML anatomy — apiVersion, kind, metadata, spec, labels.
First workloads — Deployment and Service.
Day-one practices — namespaces, labels, resources, security.
This post — debug and roadmap.

You now have a loop: declare in YAML → apply → observe → fix → commit. That loop is the job—whether the cluster lives on your laptop or in three regions.

← Part 4 · Blog index · Start over at Part 1

Part 4 Back to Part 1