Platform & Kubernetes ·

CRI and CSI: How Kubernetes Plugs In Runtimes and Storage

Two acronyms sound alike and both end in “interface,” but they solve different problems. CRI (Container Runtime Interface) is how the kubelet starts and manages containers on a node. CSI (Container Storage Interface) is how the cluster provisions, attaches, and mounts volumes from external storage systems. Confusing them is common; understanding both is how you debug “Pod stuck in ContainerCreating” versus “volume failed to attach.”

In short

CRI = kubelet ↔ container runtime (containerd, CRI-O) over gRPC: pull image, create sandbox, start/stop containers. CSI = control plane + kubelet ↔ storage vendor via sidecar controllers: create volume, publish to node, mount into Pod. Add CNI for networking. Each interface keeps Kubernetes core vendor-neutral.

First: disambiguate the names

Interface Full name Connects Typical question it answers
CRI Container Runtime Interface kubelet → container runtime Why won’t this container start? Wrong image? OOM at runtime?
CSI Container Storage Interface cluster ↔ storage backend Why is my PVC Pending? Why is mount failing?
CNI (related) Container Network Interface kubelet → network plugin Why has the Pod no IP? Why can’t Pods talk?

This post goes deep on CRI and CSI. For cluster layout and where the kubelet sits, see Kubernetes architecture in simple terms. For the Docker-era stack that CRI replaced on the node, see Docker — the hidden side.

Why Kubernetes split the node into interfaces

Early Kubernetes embedded Docker deeply: the kubelet talked to Docker Engine, and Docker handled images, containers, and (via plugins) some storage and network concerns. That coupling slowed innovation—every runtime improvement required changes in Kubernetes core, and every Kubernetes release risked breaking Docker assumptions.

The fix is the same design pattern used elsewhere in the ecosystem: thin core, thick plugins, stable contracts expressed as gRPC (or CNI configuration) APIs:

  • Kubernetes owns orchestration: desired state, scheduling, controllers, API.
  • Vendors own execution details: how to run a container, how to attach an EBS volume, how to assign a VPC IP.

You still operate “one cluster,” but under the hood the kubelet is a client of pluggable backends. That is why a managed EKS cluster can run containerd on nodes while your laptop lab uses CRI-O, and why the same Kubernetes version can drive NetApp, Ceph, or AWS EBS through different CSI drivers without recompiling the apiserver.

CRI — Container Runtime Interface

Role in the architecture

On every worker node, the kubelet is the agent that reconciles Pod specs assigned to that node. It does not fork containers itself. It calls a container runtime through CRI—a gRPC API defined in the Kubernetes project (protobuf services such as RuntimeService and ImageService).

Common runtimes today:

  • containerd — Industry default on many distributions; Docker Desktop and most cloud node images use it under the hood.
  • CRI-O — Lightweight runtime aimed at Kubernetes (Open Container Initiative–aligned), common on OpenShift-flavored stacks.
  • others — e.g. cri-dockerd shim when legacy Docker Engine must remain the endpoint (migration path, not the long-term default).

The runtime, in turn, uses lower-level tools—typically runc (or crun, kata, etc.)—to create Linux namespaces, cgroups, and root filesystems from OCI images. Think of CRI as the “Kubernetes-shaped” API; OCI is the “Linux-shaped” execution spec.

Sandbox + containers model

CRI distinguishes a Pod sandbox from the containers in that Pod:

  1. RunPodSandbox — Creates the pod-level environment: network namespace (wired by CNI), shared IPC/UTS settings, metadata labels, log directory. The “pause” container (registry.k8s.io/pause) often holds this sandbox alive.
  2. CreateContainer / StartContainer — For each container in the Pod spec (app, sidecar, init), the kubelet asks the runtime to create and start processes inside that sandbox.
  3. StopPodSandbox / RemovePodSandbox — Tear down when the Pod is deleted or evicted.

Init containers run sequentially via the same API; regular containers run according to Pod policy after inits succeed. When you see Init:0/2 in kubectl get pod, the kubelet is still driving CRI through that init chain.

Image pulls and credentials

ImageService handles PullImage, ListImages, and status. The kubelet resolves the image reference from the Pod spec (including imagePullSecrets converted to registry auth), then asks the runtime to pull layers. Failures surface as ErrImagePull or ImagePullBackOff—still CRI/runtime domain, not CSI.

# On a node — see which socket the kubelet uses (path varies by distro)
ps aux | grep kubelet | grep container-runtime

# containerd: introspect with crictl (CRI-compatible CLI)
sudo crictl pods
sudo crictl ps
sudo crictl images
sudo crictl inspect <container-id>

Lifecycle, probes, and resources

After start, the kubelet uses CRI for:

  • Exec / Attach / PortForward — What kubectl exec and logs use (often via the kubelet’s streaming server).
  • Container status — Exit codes, reasons, timestamps for Pod status conditions.
  • Resources — CPU/memory limits from the Pod spec are passed into the OCI spec the runtime builds; the kubelet enforces node allocatable separately.
  • Liveness / readiness — HTTP/TCP/exec probes are implemented by the kubelet calling into the container network namespace, not by the CSI layer.

CRI security and policy hooks

Because all container creation funnels through the runtime, clusters hang security policy off that path:

  • Pod Security admission — Restricts capabilities, user ID, volumes at API admission time (before kubelet acts).
  • Seccomp, AppArmor, SELinux — Fields in the Pod spec translated into OCI linux options by the runtime.
  • RuntimeClass — Selects a different CRI handler (e.g. gVisor, Kata) per Pod via spec.runtimeClassName.

Service account tokens and RBAC govern API access; they do not replace runtime isolation. For in-cluster API permissions, see Kubernetes cluster RBAC.

dockershim removal (Kubernetes 1.24+)

Docker Engine never implemented CRI. For years, Kubernetes shipped a dockershim inside the kubelet that translated CRI calls into Docker API calls. That adapter was removed in Kubernetes 1.24 (2022). Supported production paths today are direct CRI endpoints—almost always containerd or CRI-O.

What still uses Docker: developer laptops, many CI pipelines, and image builds (Dockerfile, BuildKit). Those produce OCI images that any CRI runtime pulls the same way. “We build with Docker” and “our nodes run containerd” is normal, not contradictory.

Three layers: orchestrator, CRI runtime, OCI runtime

When someone says “container runtime” in a Kubernetes context, clarify which layer they mean:

  • Kubernetes / kubelet — Decides what should run on the node; never forks processes directly.
  • CRI implementation (containerd, CRI-O) — Pulls images, manages pod sandboxes, translates CRI into OCI specs.
  • OCI runtime (runc, crun, runsc) — Creates namespaces, cgroups, and the container process on Linux.

containerd: daemon, shim, and runc

containerd is the default on EKS, GKE, AKS, kind, and most kubeadm clusters. Internally it is not a single process doing everything:

  • containerd — Long-lived daemon; exposes CRI (when configured), stores image layers and metadata, manages snapshots (often overlayfs).
  • containerd-shim (v2) — One supervising process per container; keeps stdio and exit status if containerd restarts.
  • runc — Invoked by the shim to execute the OCI bundle; exits after start while the workload keeps running.

That is why ps on a node shows containerd-shim-runc-v2 beside your application. For native introspection (bypassing CRI), operators use ctr -n k8s.io in the Kubernetes namespace containerd reserves.

CRI-O: Kubernetes-first runtime

CRI-O implements only CRI plus OCI—no Docker-compatible API. It is common on OpenShift and RHEL-family distributions. Choosing CRI-O versus containerd is usually a platform packaging decision; both pass Kubernetes conformance tests for CRI. Differences show up in admin tooling, snapshot drivers, and organizational standards—not in whether Pods schedule.

From Pod spec to Linux process

The API server stores the Pod; the kubelet converts it before each CRI call. Mappings you debug in production:

  • resources.limits / requests → cgroup constraints in the OCI spec.
  • securityContext → UID/GID, capabilities, readOnlyRootFilesystem, seccomp, SELinux/AppArmor profiles.
  • volumeMounts → bind mounts for Secrets, ConfigMaps, emptyDir, hostPath, CSI paths (after CSI publish succeeds).
  • imagePullSecrets → registry credentials passed into PullImage.
  • probes — Evaluated by kubelet inside the pod network namespace; not implemented inside the storage or runtime vendor’s control plane.

CreateContainerConfigError often means kubelet could not build that mapping (missing Secret, bad subPath)—before the app process starts.

Runtime roles (quick reference)

Component Speaks to kubelet via Typical role
containerd CRI gRPC Default cloud/node runtime; embeds image store and shims
CRI-O CRI gRPC Lean K8s-native runtime on OpenShift/RHEL stacks
runc OCI spec (invoked by shim) Creates actual container process
Docker Engine Docker API (not CRI) Build/local dev; not kubelet’s node endpoint post-1.24

Node operations: sockets and privilege

CRI is a Unix domain socket (path varies by distro):

# containerd (common)
export CONTAINER_RUNTIME_ENDPOINT=unix:///run/containerd/containerd.sock

# CRI-O
export CONTAINER_RUNTIME_ENDPOINT=unix:///var/run/crio/crio.sock

sudo crictl pods
sudo crictl ps -a
sudo crictl logs <container-id>

Access to the CRI socket is effectively root on the node—treat node SSH and break-glass like cluster-admin. Image garbage collection runs on kubelet thresholds and runtime policies; disk pressure often traces to unbounded image layers or container logs, visible with crictl images.

CSI — Container Storage Interface

Role in the architecture

Stateful workloads need disks that survive Pod restarts. Kubernetes models that with PersistentVolumeClaim (PVC)PersistentVolume (PV) binding and volumeMounts on Pods. Before CSI, in-tree volume plugins lived inside the Kubernetes binary—slow to ship, hard to upgrade, vendor-specific.

CSI moves storage logic out of tree into a driver deployed on the cluster (controller Deployment + node DaemonSet is the usual shape). The driver implements the CSI spec (identity, controller, node gRPC services) for a given array, cloud API, or filesystem.

The three Kubernetes components you will meet

CSI drivers are not one monolith. Kubernetes wraps the vendor plugin with sidecars:

Component Runs where Responsibility
CSI controller (driver + sidecars) Control plane or any node (often with leader election) Create/delete volumes, snapshots, expand capacity—anything that talks to the storage API without being tied to one kubelet.
external-provisioner sidecar With controller Watches PVCs; calls driver CreateVolume; creates PV objects.
external-attacher sidecar With controller Wires VolumeAttachment objects; calls ControllerPublishVolume so the disk is visible to a specific node.
CSI node plugin Every worker (DaemonSet) Stage/publish mount on the node: NodeStageVolume, NodePublishVolume (bind mount into Pod path).
kubelet Every worker Invokes the node plugin via a Unix socket; does not implement vendor logic itself.

Volume lifecycle (end to end)

Tracing one dynamic PVC helps more than memorizing RPC names:

  1. Developer creates a PVC referencing a StorageClass (provisioner: ebs.csi.aws.com style).
  2. external-provisioner sees the claim, calls the driver’s CreateVolume, creates a PV, binds PVC↔PV.
  3. Scheduler places a Pod that uses the claim on node N.
  4. attach/detach controller + external-attacher ensure a VolumeAttachment exists; controller publishes the volume to node N (cloud: “attach disk i-…” to instance).
  5. kubelet on N starts mounting: NodeStageVolume (global mount, e.g. block device formatted and mounted under /var/lib/kubelet/...), then NodePublishVolume (bind into the Pod’s volumeMounts[].mountPath).
  6. Only after mounts succeed does the kubelet proceed with CRI sandbox/container creation—hence long ContainerCreating when storage is slow or misconfigured.

Deletion reverses the chain: unmount → unpublish → detach → DeleteVolume if reclaim policy is Delete.

StorageClass, reclaim policy, and access modes

  • StorageClass — Parameters (type: gp3, fsType: ext4), provisioner name, binding mode (WaitForFirstConsumer delays provisioning until a Pod is scheduled—useful for topology-aware zones).
  • Reclaim policyDelete removes backing storage with the PV; Retain leaves data for manual recovery (ops-friendly, cost-aware).
  • accessModesReadWriteOnce (one node writer), ReadOnlyMany, ReadWriteMany (requires driver support, often NFS/EFS/File). Mismatch between mode and driver capability is a frequent “works in dev, fails in prod” bug.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com   # example — use your driver's name
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3
  encrypted: "true"

Snapshots, cloning, and expansion

CSI standardized optional capabilities beyond basic mount:

  • Volume snapshotsVolumeSnapshotClass, VolumeSnapshot, restore into new PVCs (backup/DR, blue-green DB cutover).
  • Volume expansion — Resize PVC if allowVolumeExpansion: true on the StorageClass; driver must implement ControllerExpandVolume / node expand.
  • Topology — Provision in the same AZ as the Pod; critical for zonal block storage.

How CRI and CSI meet on one Pod

When a scheduled Pod lands on a node, the kubelet roughly:

  1. Resolves volumes (CSI mounts, ConfigMaps, Secrets, emptyDir).
  2. Calls CNI to set up the network namespace for the sandbox.
  3. Runs CRI: sandbox → init containers → app containers.
  4. Updates Pod status back to the API server.

Storage failures block step 3 even if the image is perfect. Runtime failures block step 3 even if the PVC bound days ago. That ordering is why splitting mental models matters.

In-tree vs CSI (migration note)

Older clusters used in-tree cloud providers (kubernetes.io/aws-ebs style). Those plugins are deprecated in favor of CSI drivers maintained by vendors. If you still see in-tree provisioner names on StorageClasses, plan migration: install the CSI driver, create parallel StorageClasses, roll workloads, retire old PVs. Mixed provisioners on one cluster are normal during migration but confusing for developers without documentation.

Debugging playbook

Symptoms → likely layer

Symptom Look at
ImagePullBackOff CRI / registry auth / image name
CrashLoopBackOff (container starts then exits) App process, probes, CRI logs — usually not CSI
ContainerCreating for a long time CSI mount, CNI, or huge image pull
PVC Pending Provisioner, StorageClass, quotas, CSI controller
FailedAttachVolume / FailedMount events CSI attacher, node plugin, IAM/cloud API

Commands worth habit

# Pod events — first stop for both CRI and CSI
kubectl describe pod my-app-7d4f9 -n team-a

# PVC / PV chain
kubectl get pvc,pv,storageclass -n team-a
kubectl describe pvc data-my-app-7d4f9 -n team-a

# Volume attachments (CSI attach path)
kubectl get volumeattachment | grep <pv-name>

# CSI driver registration
kubectl get csidrivers
kubectl get csinode

# On the node (SSH or debug daemonset)
sudo crictl pods | grep my-app
sudo journalctl -u kubelet -f
# Driver-specific node pod logs:
kubectl logs -n kube-system -l app=ebs-csi-node --tail=100

For a structured lab workflow on manifests and day-one practices, follow the Kubernetes hands-on series.

Choosing and operating drivers in production

  • Runtime — Prefer containerd or CRI-O directly; avoid depending on Docker Engine as the kubelet endpoint unless you are mid-migration.
  • CSI driver — Install from the vendor or Kubernetes SIG chart; pin versions; read release notes for breaking API changes.
  • IAM and secrets — Cloud CSI controllers need credentials (IRSA on EKS, workload identity on GKE). Missing IAM shows up as attach timeouts, not as API 403 on kubectl.
  • Topology — Use WaitForFirstConsumer for zonal disks; spread Pods and storage across AZs consciously.
  • Backup — Application-consistent backup still needs snapshots + quiescing; CSI snapshots are infrastructure-level.
  • Observability — Alert on PVC Pending age, mount error events, and kubelet PLEG issues (runtime health).

Common pitfalls

  • Calling every node problem “Docker” — The kubelet likely talks to containerd; use crictl, not only docker ps.
  • RWX on a RWO-only driver — Access mode must match driver capabilities.
  • Wrong StorageClass default — Clusters without a default leave PVCs Pending forever.
  • Zone skew — Pod rescheduled to another AZ while volume stays zonal → attach fails until you fix topology or use regional/shared storage.
  • Forgetting attach limits — Cloud instances cap attached disks; noisy neighbors on a node can exhaust attachments.
  • CSI driver not on new node pool — DaemonSet must run on every worker; taints/tolerations can silently exclude GPU or system pools.

Mental model checklist

  1. CRI answers: “Are containers running on this node the way the Pod spec says?”
  2. CSI answers: “Is external storage created, attached to this node, and mounted into the Pod path?”
  3. CNI answers: “Does this Pod have the right network interfaces and routes?”
  4. The API server stores intent; the kubelet orchestrates plugins; controllers and sidecars handle storage control plane work.

Further reading

  • Kubernetes documentation — Container Runtime Interface (CRI)
  • Kubernetes documentation — CSI Volume Cloning, Snapshots, and Expansion
  • CNCF — CSI spec and driver maturity listings
  • containerd and CRI-O project docs — debugging with ctr, crictl, crio status

Blog index · PV, PVC, StorageClass · Kubernetes architecture · Docker hidden side · Hands-on lab

Back to blog list