Kubernetes StatefulSets in Depth: Running PostgreSQL and Redis

Deployments are built for cattle; databases and caches are often pets. A StatefulSet gives each Pod a stable name, predictable DNS, ordered lifecycle, and per-replica disks—exactly what you need before you run PostgreSQL or Redis on Kubernetes yourself. This guide explains how StatefulSets work under the hood, then walks through production-shaped manifests for both data stores.

In short

Pair every StatefulSet with a headless Service (clusterIP: None), use volumeClaimTemplates for durable data, and treat pod-name-0 DNS as the contract. PostgreSQL needs ReadWriteOnce block storage and careful init; Redis needs memory limits, persistence policy, and a clear HA story (single instance vs Sentinel vs Cluster)—managed services often beat DIY for production.

Why StatefulSets exist

A Deployment creates interchangeable Pods behind a single Service. If Pod web-7f8c dies, the replacement is a new identity with no guarantee of hostname or disk attachment order. That is ideal for stateless APIs.

Stateful systems break that model:

  • Identity — Replica 0 is the primary; replica 1 is a follower. They are not interchangeable.
  • Stable network — Peers must reach postgres-0.postgres, not “whatever Pod the Service picked today.”
  • Stable storage — When postgres-0 reschedules, it must reattach its volume, not a random PVC.
  • Ordered operations — Rolling out followers before the primary can corrupt replication bootstrap.

The StatefulSet controller enforces those rules. It is still “just Kubernetes”—not magic HA. You still design replication, backups, and failover; the controller gives you predictable building blocks.

Prerequisites: understand PV, PVC, and StorageClass and Kubernetes architecture. For Redis semantics beyond Kubernetes wiring, see Redis and redis-cli in depth.

Deployment vs StatefulSet

ConcernDeploymentStatefulSet
Pod namesRandom suffix (web-7d4f9c)Stable ordinal (db-0, db-1)
DNSService load-balances to any ready PodHeadless Service gives per-Pod A records
StorageUsually shared PVC or nonevolumeClaimTemplates → one PVC per Pod
Scale upParallel, any orderSequential by default (0, then 1, …)
Scale downAny Pod may terminateReverse order (highest index first)
Typical workloadsWeb APIs, workersDatabases, ZooKeeper, Kafka brokers, Redis with stable peers

StatefulSet anatomy

Four ideas work together:

  1. serviceName — Must match a headless Service. That Service creates DNS for each Pod.
  2. Pod identity — Pod name = <statefulset-name>-<ordinal>. Labels come from the Pod template; the controller adds the ordinal.
  3. volumeClaimTemplates — For each Pod, a PVC named <template-name>-<statefulset-name>-<ordinal>.
  4. Update strategyRollingUpdate (default) or OnDelete (you delete Pods to trigger recreation—common for databases with manual failover steps).

DNS pattern (namespace data, StatefulSet postgres, headless Service postgres):

  • postgres-0.postgres.data.svc.cluster.local — Pod 0 only
  • postgres.data.svc.cluster.local — All ready Pods (A records for each)

Clients that need the primary connect to postgres-0; read pools can target the headless Service or specific ordinals once replication is configured.

Headless Service (required)

A normal ClusterIP Service load-balances to backends. A headless Service returns Pod IPs directly via DNS—no virtual IP in front:

apiVersion: v1
kind: Service
metadata:
  name: postgres
  namespace: data
spec:
  clusterIP: None
  selector:
    app: postgres
  ports:
    - name: postgres
      port: 5432
      targetPort: 5432

Without clusterIP: None, peer discovery for replication and operator-style clustering fails in subtle ways: apps resolve the Service VIP and land on random replicas.

Storage contract for databases

PostgreSQL and single-instance Redis on Kubernetes almost always use:

  • accessModes: [ReadWriteOnce] — One node mounts the block volume at a time (typical cloud disk).
  • StorageClass with WaitForFirstConsumer — Provisions the disk in the same zone as the scheduled Pod (see storage guide).
  • reclaimPolicy: Retain in production — Deleting a PVC should not silently wipe prod data; pair with snapshots.

Scale-down does not delete PVCs created from volumeClaimTemplates. That protects data but creates cost leaks if you shrink a StatefulSet and forget orphaned volumes.

Lab layout: namespace and storage

Apply a namespace and StorageClass appropriate to your cluster (minikube/kind often have standard; EKS might use gp3):

apiVersion: v1
kind: Namespace
metadata:
  name: data
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

On managed clusters, use the cloud’s CSI StorageClass instead of no-provisioner. The binding mode and reclaim policy are what matter for StatefulSets.

PostgreSQL on a StatefulSet (single primary)

Start with one replica (replicas: 1). Multi-node PostgreSQL HA (Patroni, CloudNativePG, Crunchy Operator) adds failover automation—worth it in production, but the StatefulSet mechanics are the same: stable identity + disk per ordinal.

Secrets and configuration

apiVersion: v1
kind: Secret
metadata:
  name: postgres-auth
  namespace: data
type: Opaque
stringData:
  POSTGRES_USER: app
  POSTGRES_PASSWORD: change-me-in-production
  POSTGRES_DB: appdb

Never commit real passwords. In production use External Secrets, Sealed Secrets, or cloud secret managers; rotate credentials and restrict who can read Secrets via RBAC.

ConfigMap for postgresql.conf fragments

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
  namespace: data
data:
  POSTGRES_INITDB_ARGS: "--encoding=UTF8 --locale=C"
  custom.conf: |
    max_connections = 100
    shared_buffers = 256MB
    log_statement = 'ddl'

Headless Service + StatefulSet

apiVersion: v1
kind: Service
metadata:
  name: postgres
  namespace: data
spec:
  clusterIP: None
  selector:
    app: postgres
  ports:
    - name: postgres
      port: 5432
      targetPort: 5432
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: data
spec:
  serviceName: postgres
  replicas: 1
  podManagementPolicy: OrderedReady
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      securityContext:
        fsGroup: 999
      terminationGracePeriodSeconds: 60
      containers:
        - name: postgres
          image: postgres:16-bookworm
          ports:
            - containerPort: 5432
              name: postgres
          envFrom:
            - secretRef:
                name: postgres-auth
          env:
            - name: PGDATA
              value: /var/lib/postgresql/data/pgdata
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
            - name: config
              mountPath: /etc/postgresql/conf.d
              readOnly: true
          resources:
            requests:
              cpu: 250m
              memory: 512Mi
            limits:
              memory: 2Gi
          livenessProbe:
            exec:
              command: ["pg_isready", "-U", "app", "-d", "appdb"]
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            exec:
              command: ["pg_isready", "-U", "app", "-d", "appdb"]
            initialDelaySeconds: 5
            periodSeconds: 5
      volumes:
        - name: config
          configMap:
            name: postgres-config
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 20Gi

PostgreSQL depth notes

  • PGDATA subdirectory — Official image expects an empty mount or uses pgdata subdir so lost+found on some filesystems does not break init.
  • fsGroup: 999 — Matches postgres image group so the volume is writable; permission errors on mount are a top failure mode.
  • Probespg_isready checks the server accepts connections; tune delays so slow storage does not flip-flop restarts during cold start.
  • Graceful shutdownterminationGracePeriodSeconds gives PostgreSQL time to checkpoint before SIGKILL; pair with preStop if you automate failover.
  • Connections from apps — In-cluster URL: postgres://app:[email protected]:5432/appdb. For a single replica, targeting postgres-0 is explicit; a regular ClusterIP Service in front is optional for apps that do not care which ordinal they hit (only valid for one replica).
  • Scaling replicas > 1 — Do not simply set replicas: 3 on the same StatefulSet without replication software; you get three independent databases. Use an operator or Patroni-style HA.
  • Backups — Volume snapshots are crash-consistent at best; use pg_dump, logical replication, or WAL archiving for RPO/RTO you can defend in an incident review.

Verify PostgreSQL

kubectl apply -f namespace-and-sc.yaml -f postgres.yaml
kubectl get pods,pvc -n data -w
kubectl exec -it postgres-0 -n data -- psql -U app -d appdb -c "SELECT version();"
kubectl run psql-client --rm -it --image=postgres:16 --restart=Never -n data -- \
  psql -h postgres-0.postgres -U app -d appdb

Redis on a StatefulSet (standalone with persistence)

Redis is often cache-only (ephemeral), but when you store sessions or use it as a fast primary, persistence and restart safety matter. A one-replica StatefulSet is the teaching baseline; production teams frequently choose managed Redis, Redis Sentinel, or Redis Cluster with Helm/operators instead of hand-rolled YAML.

ConfigMap: redis.conf

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-config
  namespace: data
data:
  redis.conf: |
    bind 0.0.0.0
    protected-mode yes
    port 6379
    appendonly yes
    appendfsync everysec
    save 900 1
    save 300 10
    maxmemory 512mb
    maxmemory-policy allkeys-lru
    dir /data

appendonly yes enables AOF; RDB snapshots still run via save lines. Tune maxmemory below your container limit so the kernel OOM killer does not strike first. See Redis persistence and eviction for trade-offs.

Secret for AUTH (Redis 6+ ACL or requirepass)

apiVersion: v1
kind: Secret
metadata:
  name: redis-auth
  namespace: data
type: Opaque
stringData:
  redis-password: change-me-in-production

Headless Service + StatefulSet

apiVersion: v1
kind: Service
metadata:
  name: redis
  namespace: data
spec:
  clusterIP: None
  selector:
    app: redis
  ports:
    - name: redis
      port: 6379
      targetPort: 6379
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
  namespace: data
spec:
  serviceName: redis
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      securityContext:
        fsGroup: 1000
      containers:
        - name: redis
          image: redis:7.2-bookworm
          command:
            - redis-server
            - /etc/redis/redis.conf
            - --requirepass
            - $(REDIS_PASSWORD)
          env:
            - name: REDIS_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: redis-auth
                  key: redis-password
          ports:
            - containerPort: 6379
              name: redis
          volumeMounts:
            - name: data
              mountPath: /data
            - name: config
              mountPath: /etc/redis
              readOnly: true
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              memory: 768Mi
          livenessProbe:
            tcpSocket:
              port: 6379
            initialDelaySeconds: 15
            periodSeconds: 10
          readinessProbe:
            exec:
              command:
                - sh
                - -c
                - redis-cli -a "$REDIS_PASSWORD" ping | grep -q PONG
            initialDelaySeconds: 5
            periodSeconds: 5
      volumes:
        - name: config
          configMap:
            name: redis-config
            items:
              - key: redis.conf
                path: redis.conf
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 8Gi

Redis HA patterns (when one Pod is not enough)

PatternReplicasFailoverKubernetes fit
Standalone + PVC1Manual / restartSimple StatefulSet (above)
Primary + replicas1 + NReplication; manual promotionStatefulSet ordinals; apps read from replicas knowing lag
Redis Sentinel3+ Redis + 3 SentinelsQuorum elects new primaryMultiple StatefulSets or Helm chart; stable DNS still matters
Redis Cluster6+ (sharded)Hash slots migrated on node lossComplex—use operator; not “scale replicas” on one StatefulSet

Do not point two Redis primaries at the same RWO volume. Each ordinal needs its own PVC; replication is a software relationship between distinct data directories.

Verify Redis

kubectl apply -f redis.yaml
kubectl exec -it redis-0 -n data -- redis-cli -a "$(kubectl get secret redis-auth -n data -o jsonpath='{.data.redis-password}' | base64 -d)" ping
kubectl run redis-client --rm -it --image=redis:7.2 --restart=Never -n data -- \
  redis-cli -h redis-0.redis -a 'change-me-in-production' SET session:demo ok EX 300

Application tier: wiring Postgres and Redis

A stateless Deployment for your API reads connection strings from a ConfigMap or Secret:

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: data
data:
  DATABASE_HOST: postgres-0.postgres
  DATABASE_PORT: "5432"
  REDIS_HOST: redis-0.redis
  REDIS_PORT: "6379"

Use separate Secrets for passwords. Enforce least privilege with NetworkPolicies: only the app namespace may reach ports 5432 and 6379 in data. Expose neither Service as LoadBalancer unless you have a deliberate external access design.

Lifecycle: ordered scale and updates

Scale up — Pod postgres-1 starts only after postgres-0 is Running and Ready (with default OrderedReady). For databases without auto-clustering, scaling up creates idle nodes you must join to replication manually.

Scale down — Highest index terminates first; PVC remains. To shrink storage costs, delete unused PVCs only after backup and confirmation the ordinal will not return.

Rolling update — Partition field (spec.updateStrategy.rollingUpdate.partition) can pin updates to ordinals ≥ N—useful to update followers before the primary in advanced setups.

Pod disruption — Define a PodDisruptionBudget so voluntary evictions (node drain) do not take down your only Postgres Pod without planning:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: postgres-pdb
  namespace: data
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: postgres

Troubleshooting StatefulSets, Postgres, and Redis

SymptomLikely causeWhat to check
Pod Pending PVC not bound, zone mismatch, insufficient CPU/mem kubectl describe pod postgres-0 -n data; get pvc; StorageClass topology
Postgres CrashLoop, permission denied fsGroup / wrong PGDATA, non-empty mount Logs; exec and ls -la /var/lib/postgresql/data
Postgres starts empty after “fix” New PVC bound—wrong ordinal or deleted claim get pvc names must match data-postgres-0
Redis LOADING or OOM AOF rewrite, memory over limit INFO memory; raise limit or lower maxmemory
App cannot resolve DB host Missing headless Service or wrong DNS name nslookup postgres-0.postgres.data.svc.cluster.local from debug Pod
Two writers corrupt data Multiple primaries on shared disk or split brain One primary per shard; use Sentinel/operator HA
kubectl get statefulset,pods,pvc -n data
kubectl describe statefulset postgres -n data
kubectl logs postgres-0 -n data --previous
kubectl get events -n data --sort-by='.lastTimestamp'

Broader Pod failure modes: Kubernetes troubleshooting playbook.

Production: when to run your own vs buy managed

  • Managed RDS / Aurora / Cloud SQL / Azure Database — Automated backups, patching, failover; you trade control for operational maturity.
  • Managed Redis (ElastiCache, Memorystore, etc.) — Multi-AZ failover without operating Sentinel yourself.
  • Operators on Kubernetes — CloudNativePG, Crunchy Postgres, Redis Operator encode HA in CRDs; still your cluster, but less raw YAML.
  • Self-run StatefulSet — Best for learning, edge constraints, or platform teams with dedicated DBA/SRE capacity and on-call runbooks.

If you do self-host: encrypt PVCs at rest, restrict RBAC on the data namespace, snapshot on schedule, test restore quarterly, and document RTO/RPO in your incident and DR practice.

GitOps and day-two operations

Store manifests in Git; avoid kubectl edit on StatefulSets in production (GitOps principles). Version image tags; pin major Postgres/Redis versions; test upgrades on a cloned snapshot volume. Record runbooks for: failed readiness after node drain, PVC expansion, password rotation, and breaking glass failover.

Hands-on learning path

  1. Deploy headless Service + single-replica Postgres; write a row; delete the Pod; confirm data survives on reschedule.
  2. Inspect PVC name data-postgres-0 and confirm it rebinds to the same PV.
  3. Add Redis with AOF; restart Pod; verify key still exists.
  4. Scale StatefulSet to 2 without replication config—observe why apps must not treat both as primaries.
  5. Simulate PVC Pending with wrong StorageClass and fix from Events.

Further reading

  • Kubernetes documentation — StatefulSets, StatefulSet Basics
  • PostgreSQL documentation — replication, backup, and pg_isready
  • Redis documentation — replication, Sentinel, Cluster, persistence
  • CNCF operators — CloudNativePG, Redis Operator

Blog index · PV, PVC, StorageClass · Redis deep dive · SQL course · Kubernetes architecture

Back to blog list