Docker and Containerization: The Hidden Side Most Tutorials Skip
You can docker run an image in five minutes. That speed hides a stack of Linux primitives, storage tricks, and security trade-offs. This post is the knowledge base behind the commands: what actually runs, what can go wrong, and how containers connect to the platform work you will do next.
In short
A container is a process (or tree of processes) isolated with namespaces, limited with cgroups, and started from a read-only image made of stacked layers. Docker is a convenient front end—not magic, not a VM, and not a substitute for hardening, observability, or orchestration when you outgrow a laptop.
The vocabulary people mix up
Image is a packaged filesystem plus metadata (default command, env vars, exposed ports). It is built in layers and usually stored in a registry. Container is a running instance of an image: writable layer on top, process(es) inside, network identity, mounts. You can have many containers from one image.
Virtual machine runs a guest operating system on emulated or virtual hardware. A container shares the host kernel and only looks like its own environment. That is why containers start fast and stay smaller—but also why a Linux container on Linux is native, while “Linux container on Windows” involves extra plumbing. If namespaces and cgroups are new, start with Linux in depth for the host OS fundamentals.
Docker popularized the developer experience (CLI, Dockerfile, Hub). Under the hood, modern engines use the OCI (Open Container Initiative) image and runtime specs so the same image can run with containerd, CRI-O, or other compliant runtimes—not only the Docker brand.
What happens when you run a container
When you type docker run nginx, more happens than “download and go”:
- The client talks to the container engine (historically the Docker daemon; today often containerd with a Docker-compatible API).
- If the image is missing, layers are pulled from a registry and unpacked using a storage driver.
- The engine asks a low-level runtime—typically runc—to create namespaces, apply cgroup limits, set up the root filesystem, and
execyour process. - Networking hooks attach the container to a bridge, overlay, or host network as configured.
The hidden lesson: your app is just PID 1 (or a child) on the host kernel, wearing a costume of isolated mount, process, network, and user IDs. Debug with that mental model—docker exec enters the same namespaces; host tools can still see cgroups and processes if you know where to look.
Images are layers—and layers have consequences
Each Dockerfile instruction that modifies the filesystem usually creates a new layer. Layers are stored read-only and shared between images (deduplication). When a container runs, Docker adds a thin writable container layer; changes there disappear when the container is removed unless you commit or use volumes.
Hidden costs:
- Order matters in builds. Put rarely changing steps (base image, OS packages) before frequently changing steps (copy source,
npm install) so cache hits save time and bandwidth. - Deleted files can still weigh down an image. Removing a secret in a later layer does not remove it from earlier layers—anyone with the image can still extract it. Never bake secrets into images; inject at runtime.
- Disk fills up silently. Dangling images, stopped containers, and build cache accumulate.
docker system dfand prudent pruning are operational hygiene, not optional cleanup.
Multi-stage builds are the professional pattern: compile in a fat builder stage, copy only artifacts into a minimal runtime stage. You ship less attack surface and smaller transfers.
Isolation: namespaces and cgroups
Namespaces slice what a process can see: its own PID list, network stack, mount tree, hostname, and (with user namespaces) UID mappings. They provide the illusion of a private machine.
Cgroups (control groups) cap and account resources: CPU, memory, I/O, PIDs. Without limits, one container can starve the host—Kubernetes and Docker both expose these as requests and limits in different forms.
Isolation is not absolute. Misconfigured capabilities, privileged mode, host PID or network namespaces, and kernel bugs are how “container escape” enters incident reports. Default Docker containers are still more constrained than bare processes, but privileged: true is effectively root on the host with extra steps.
Storage: the writable layer, volumes, and bind mounts
- Writable container layer — fine for ephemeral caches; bad for databases you care about.
- Volume — managed by Docker, portable across containers on the same engine, good default for persistent data.
- Bind mount — maps a host path into the container; powerful for development, risky in production if permissions or paths leak host data.
Permissions bite quietly: processes in the image often run as root unless you set USER. Files created on volumes may be owned as root on the host, breaking CI or local editors. Plan UID/GID or fsGroup-style fixes before production.
Networking: bridges, ports, and DNS
Default bridge networking gives each container an IP on a virtual bridge; published ports (-p 8080:80) set up NAT from the host. Host network mode drops isolation and uses the host stack directly—fast, dangerous on multi-tenant machines.
On user-defined bridge networks, Docker provides an embedded DNS server so containers resolve each other by name—Compose service names depend on this. The hidden gotcha: containers on the default bridge do not get automatic DNS between containers; you need links (legacy) or a custom network.
In Kubernetes, networking moves to CNI plugins and Services; the mental model (Pod IP, ClusterIP, ingress) builds on the same idea—stable names in front of ephemeral workloads.
Security: what “non-root in Dockerfile” does not fix
- Root inside the container is often still a powerful user on the host if the container is privileged or has dangerous capabilities (
CAP_SYS_ADMIN, etc.). - Image supply chain — trust registries, scan images, pin by digest (
nginx@sha256:…) not only by tag (latestmoves). - Read-only root filesystem and dropping capabilities reduce blast radius; seccomp and AppArmor/SELinux profiles add depth where teams invest.
- The daemon is high value. Access to the Docker socket is root-equivalent on the host—never mount it into untrusted containers.
Security is a stack: minimal base images (distroless, alpine with eyes open), no secrets in layers, regular patches, and runtime policy—not a single checkbox. For how scratch, Alpine, and Ubuntu bases differ and how to assemble images from an empty base, see Alpine, Ubuntu, and scratch images.
PID 1, signals, and graceful shutdown
In a container, your main process is often PID 1. On Linux, PID 1 has special duties: reaping zombie children and handling signals in ways normal apps ignore. If your app is a shell script wrapping Java, signals may never reach Java; Kubernetes or Docker stop timeouts then kill harshly.
Fixes people learn the hard way: use an init wrapper (tini, dumb-init), run the app directly as PID 1, or ensure your wrapper forwards SIGTERM. Set STOPSIGNAL and align stop grace periods with how long shutdown actually takes.
Health, logs, and observability
HEALTHCHECK in a Dockerfile (or orchestrator probes later) tells the platform when to restart—not when users are happy. Logs go to stdout/stderr by design; the engine captures them. Hidden trap: log drivers and rotation—unbounded json-file logs can fill disks on long-lived VMs.
Metrics and traces still belong to your application. Containers do not replace APM; they make it easier to ship the same binary everywhere if instrumentation travels with the image.
Compose, Swarm, and Kubernetes—where Docker stops
Docker Compose models multi-container apps on one host (or one Docker context)—excellent for dev and small deployments. Swarm added clustering but lost mindshare to Kubernetes.
Kubernetes does not run “Docker the product” inside the cluster anymore; it schedules Pods via CRI, usually containerd, using the same OCI images you built with Docker or BuildKit. Your Dockerfile skills transfer; the control plane, Service mesh, and GitOps layers are the next chapter—see Kubernetes architecture in simple terms and GitOps principles.
BuildKit and the modern build pipeline
Classic docker build is sequential. BuildKit parallelizes independent stages, improves cache export/import, and supports secrets and SSH mounts during build without leaving credentials in layers. On current Docker Desktop and many CI images, BuildKit is default—worth enabling explicitly in pipelines that still disable it.
Pair BuildKit with a .dockerignore as strict as .gitignore. Sending the entire repo context into the daemon slows every build and can leak local files into layers.
Checklist: hidden-side habits that survive production
- Pin base images by digest in security-sensitive pipelines; rebuild on CVE, not on
latestwhims. - Run as non-root; read-only root where possible; no privileged unless documented and time-boxed.
- One main process per container; handle signals and zombies deliberately.
- Persist data on volumes; size and back them up; never rely on the writable layer.
- Set memory and CPU limits before the orchestrator does it abruptly in prod.
- Prune and monitor disk; treat the daemon socket like root credentials.
- Scan images in CI; sign and attest when your org requires supply-chain proof.
Further reading
- Docker documentation — build cache, multi-stage builds, networking, and security
- Open Container Initiative — image and runtime specifications
- Linux man pages —
namespaces(7),cgroups(7) - Liz Rice — Container Security (O’Reilly) for depth beyond tutorials
- CNCF / Kubernetes docs — from single host to cluster operations
Blog index · Linux in depth · Kubernetes architecture · DevOps foundations