Kubernetes StatefulSets in Depth: Running PostgreSQL and Redis
Deployments are built for cattle; databases and caches are often pets. A StatefulSet gives each Pod a stable name, predictable DNS, ordered lifecycle, and per-replica disks—exactly what you need before you run PostgreSQL or Redis on Kubernetes yourself. This guide explains how StatefulSets work under the hood, then walks through production-shaped manifests for both data stores.
In short
Pair every StatefulSet with a headless Service (clusterIP: None), use volumeClaimTemplates for durable data, and treat pod-name-0 DNS as the contract. PostgreSQL needs ReadWriteOnce block storage and careful init; Redis needs memory limits, persistence policy, and a clear HA story (single instance vs Sentinel vs Cluster)—managed services often beat DIY for production.
Why StatefulSets exist
A Deployment creates interchangeable Pods behind a single Service. If Pod web-7f8c dies, the replacement is a new identity with no guarantee of hostname or disk attachment order. That is ideal for stateless APIs.
Stateful systems break that model:
- Identity — Replica 0 is the primary; replica 1 is a follower. They are not interchangeable.
- Stable network — Peers must reach
postgres-0.postgres, not “whatever Pod the Service picked today.” - Stable storage — When
postgres-0reschedules, it must reattach its volume, not a random PVC. - Ordered operations — Rolling out followers before the primary can corrupt replication bootstrap.
The StatefulSet controller enforces those rules. It is still “just Kubernetes”—not magic HA. You still design replication, backups, and failover; the controller gives you predictable building blocks.
Prerequisites: understand PV, PVC, and StorageClass and Kubernetes architecture. For Redis semantics beyond Kubernetes wiring, see Redis and redis-cli in depth.
Deployment vs StatefulSet
| Concern | Deployment | StatefulSet |
|---|---|---|
| Pod names | Random suffix (web-7d4f9c) | Stable ordinal (db-0, db-1) |
| DNS | Service load-balances to any ready Pod | Headless Service gives per-Pod A records |
| Storage | Usually shared PVC or none | volumeClaimTemplates → one PVC per Pod |
| Scale up | Parallel, any order | Sequential by default (0, then 1, …) |
| Scale down | Any Pod may terminate | Reverse order (highest index first) |
| Typical workloads | Web APIs, workers | Databases, ZooKeeper, Kafka brokers, Redis with stable peers |
StatefulSet anatomy
Four ideas work together:
serviceName— Must match a headless Service. That Service creates DNS for each Pod.- Pod identity — Pod name =
<statefulset-name>-<ordinal>. Labels come from the Pod template; the controller adds the ordinal. volumeClaimTemplates— For each Pod, a PVC named<template-name>-<statefulset-name>-<ordinal>.- Update strategy —
RollingUpdate(default) orOnDelete(you delete Pods to trigger recreation—common for databases with manual failover steps).
DNS pattern (namespace data, StatefulSet postgres, headless Service postgres):
postgres-0.postgres.data.svc.cluster.local— Pod 0 onlypostgres.data.svc.cluster.local— All ready Pods (A records for each)
Clients that need the primary connect to postgres-0; read pools can target the headless Service or specific ordinals once replication is configured.
Headless Service (required)
A normal ClusterIP Service load-balances to backends. A headless Service returns Pod IPs directly via DNS—no virtual IP in front:
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: data
spec:
clusterIP: None
selector:
app: postgres
ports:
- name: postgres
port: 5432
targetPort: 5432
Without clusterIP: None, peer discovery for replication and operator-style clustering fails in subtle ways: apps resolve the Service VIP and land on random replicas.
Storage contract for databases
PostgreSQL and single-instance Redis on Kubernetes almost always use:
accessModes: [ReadWriteOnce]— One node mounts the block volume at a time (typical cloud disk).- StorageClass with
WaitForFirstConsumer— Provisions the disk in the same zone as the scheduled Pod (see storage guide). reclaimPolicy: Retainin production — Deleting a PVC should not silently wipe prod data; pair with snapshots.
Scale-down does not delete PVCs created from volumeClaimTemplates. That protects data but creates cost leaks if you shrink a StatefulSet and forget orphaned volumes.
Lab layout: namespace and storage
Apply a namespace and StorageClass appropriate to your cluster (minikube/kind often have standard; EKS might use gp3):
apiVersion: v1
kind: Namespace
metadata:
name: data
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
On managed clusters, use the cloud’s CSI StorageClass instead of no-provisioner. The binding mode and reclaim policy are what matter for StatefulSets.
PostgreSQL on a StatefulSet (single primary)
Start with one replica (replicas: 1). Multi-node PostgreSQL HA (Patroni, CloudNativePG, Crunchy Operator) adds failover automation—worth it in production, but the StatefulSet mechanics are the same: stable identity + disk per ordinal.
Secrets and configuration
apiVersion: v1
kind: Secret
metadata:
name: postgres-auth
namespace: data
type: Opaque
stringData:
POSTGRES_USER: app
POSTGRES_PASSWORD: change-me-in-production
POSTGRES_DB: appdb
Never commit real passwords. In production use External Secrets, Sealed Secrets, or cloud secret managers; rotate credentials and restrict who can read Secrets via RBAC.
ConfigMap for postgresql.conf fragments
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
namespace: data
data:
POSTGRES_INITDB_ARGS: "--encoding=UTF8 --locale=C"
custom.conf: |
max_connections = 100
shared_buffers = 256MB
log_statement = 'ddl'
Headless Service + StatefulSet
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: data
spec:
clusterIP: None
selector:
app: postgres
ports:
- name: postgres
port: 5432
targetPort: 5432
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: data
spec:
serviceName: postgres
replicas: 1
podManagementPolicy: OrderedReady
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
securityContext:
fsGroup: 999
terminationGracePeriodSeconds: 60
containers:
- name: postgres
image: postgres:16-bookworm
ports:
- containerPort: 5432
name: postgres
envFrom:
- secretRef:
name: postgres-auth
env:
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
- name: config
mountPath: /etc/postgresql/conf.d
readOnly: true
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
memory: 2Gi
livenessProbe:
exec:
command: ["pg_isready", "-U", "app", "-d", "appdb"]
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command: ["pg_isready", "-U", "app", "-d", "appdb"]
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: config
configMap:
name: postgres-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 20Gi
PostgreSQL depth notes
PGDATAsubdirectory — Official image expects an empty mount or usespgdatasubdir so lost+found on some filesystems does not break init.fsGroup: 999— Matches postgres image group so the volume is writable; permission errors on mount are a top failure mode.- Probes —
pg_isreadychecks the server accepts connections; tune delays so slow storage does not flip-flop restarts during cold start. - Graceful shutdown —
terminationGracePeriodSecondsgives PostgreSQL time to checkpoint before SIGKILL; pair with preStop if you automate failover. - Connections from apps — In-cluster URL:
postgres://app:[email protected]:5432/appdb. For a single replica, targetingpostgres-0is explicit; a regular ClusterIP Service in front is optional for apps that do not care which ordinal they hit (only valid for one replica). - Scaling replicas > 1 — Do not simply set
replicas: 3on the same StatefulSet without replication software; you get three independent databases. Use an operator or Patroni-style HA. - Backups — Volume snapshots are crash-consistent at best; use
pg_dump, logical replication, or WAL archiving for RPO/RTO you can defend in an incident review.
Verify PostgreSQL
kubectl apply -f namespace-and-sc.yaml -f postgres.yaml
kubectl get pods,pvc -n data -w
kubectl exec -it postgres-0 -n data -- psql -U app -d appdb -c "SELECT version();"
kubectl run psql-client --rm -it --image=postgres:16 --restart=Never -n data -- \
psql -h postgres-0.postgres -U app -d appdb
Redis on a StatefulSet (standalone with persistence)
Redis is often cache-only (ephemeral), but when you store sessions or use it as a fast primary, persistence and restart safety matter. A one-replica StatefulSet is the teaching baseline; production teams frequently choose managed Redis, Redis Sentinel, or Redis Cluster with Helm/operators instead of hand-rolled YAML.
ConfigMap: redis.conf
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config
namespace: data
data:
redis.conf: |
bind 0.0.0.0
protected-mode yes
port 6379
appendonly yes
appendfsync everysec
save 900 1
save 300 10
maxmemory 512mb
maxmemory-policy allkeys-lru
dir /data
appendonly yes enables AOF; RDB snapshots still run via save lines. Tune maxmemory below your container limit so the kernel OOM killer does not strike first. See Redis persistence and eviction for trade-offs.
Secret for AUTH (Redis 6+ ACL or requirepass)
apiVersion: v1
kind: Secret
metadata:
name: redis-auth
namespace: data
type: Opaque
stringData:
redis-password: change-me-in-production
Headless Service + StatefulSet
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: data
spec:
clusterIP: None
selector:
app: redis
ports:
- name: redis
port: 6379
targetPort: 6379
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
namespace: data
spec:
serviceName: redis
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
securityContext:
fsGroup: 1000
containers:
- name: redis
image: redis:7.2-bookworm
command:
- redis-server
- /etc/redis/redis.conf
- --requirepass
- $(REDIS_PASSWORD)
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: redis-auth
key: redis-password
ports:
- containerPort: 6379
name: redis
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /etc/redis
readOnly: true
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
memory: 768Mi
livenessProbe:
tcpSocket:
port: 6379
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
exec:
command:
- sh
- -c
- redis-cli -a "$REDIS_PASSWORD" ping | grep -q PONG
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: config
configMap:
name: redis-config
items:
- key: redis.conf
path: redis.conf
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 8Gi
Redis HA patterns (when one Pod is not enough)
| Pattern | Replicas | Failover | Kubernetes fit |
|---|---|---|---|
| Standalone + PVC | 1 | Manual / restart | Simple StatefulSet (above) |
| Primary + replicas | 1 + N | Replication; manual promotion | StatefulSet ordinals; apps read from replicas knowing lag |
| Redis Sentinel | 3+ Redis + 3 Sentinels | Quorum elects new primary | Multiple StatefulSets or Helm chart; stable DNS still matters |
| Redis Cluster | 6+ (sharded) | Hash slots migrated on node loss | Complex—use operator; not “scale replicas” on one StatefulSet |
Do not point two Redis primaries at the same RWO volume. Each ordinal needs its own PVC; replication is a software relationship between distinct data directories.
Verify Redis
kubectl apply -f redis.yaml
kubectl exec -it redis-0 -n data -- redis-cli -a "$(kubectl get secret redis-auth -n data -o jsonpath='{.data.redis-password}' | base64 -d)" ping
kubectl run redis-client --rm -it --image=redis:7.2 --restart=Never -n data -- \
redis-cli -h redis-0.redis -a 'change-me-in-production' SET session:demo ok EX 300
Application tier: wiring Postgres and Redis
A stateless Deployment for your API reads connection strings from a ConfigMap or Secret:
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: data
data:
DATABASE_HOST: postgres-0.postgres
DATABASE_PORT: "5432"
REDIS_HOST: redis-0.redis
REDIS_PORT: "6379"
Use separate Secrets for passwords. Enforce least privilege with NetworkPolicies: only the app namespace may reach ports 5432 and 6379 in data. Expose neither Service as LoadBalancer unless you have a deliberate external access design.
Lifecycle: ordered scale and updates
Scale up — Pod postgres-1 starts only after postgres-0 is Running and Ready (with default OrderedReady). For databases without auto-clustering, scaling up creates idle nodes you must join to replication manually.
Scale down — Highest index terminates first; PVC remains. To shrink storage costs, delete unused PVCs only after backup and confirmation the ordinal will not return.
Rolling update — Partition field (spec.updateStrategy.rollingUpdate.partition) can pin updates to ordinals ≥ N—useful to update followers before the primary in advanced setups.
Pod disruption — Define a PodDisruptionBudget so voluntary evictions (node drain) do not take down your only Postgres Pod without planning:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: postgres-pdb
namespace: data
spec:
minAvailable: 1
selector:
matchLabels:
app: postgres
Troubleshooting StatefulSets, Postgres, and Redis
| Symptom | Likely cause | What to check |
|---|---|---|
Pod Pending |
PVC not bound, zone mismatch, insufficient CPU/mem | kubectl describe pod postgres-0 -n data; get pvc; StorageClass topology |
| Postgres CrashLoop, permission denied | fsGroup / wrong PGDATA, non-empty mount |
Logs; exec and ls -la /var/lib/postgresql/data |
| Postgres starts empty after “fix” | New PVC bound—wrong ordinal or deleted claim | get pvc names must match data-postgres-0 |
| Redis LOADING or OOM | AOF rewrite, memory over limit | INFO memory; raise limit or lower maxmemory |
| App cannot resolve DB host | Missing headless Service or wrong DNS name | nslookup postgres-0.postgres.data.svc.cluster.local from debug Pod |
| Two writers corrupt data | Multiple primaries on shared disk or split brain | One primary per shard; use Sentinel/operator HA |
kubectl get statefulset,pods,pvc -n data
kubectl describe statefulset postgres -n data
kubectl logs postgres-0 -n data --previous
kubectl get events -n data --sort-by='.lastTimestamp'
Broader Pod failure modes: Kubernetes troubleshooting playbook.
Production: when to run your own vs buy managed
- Managed RDS / Aurora / Cloud SQL / Azure Database — Automated backups, patching, failover; you trade control for operational maturity.
- Managed Redis (ElastiCache, Memorystore, etc.) — Multi-AZ failover without operating Sentinel yourself.
- Operators on Kubernetes — CloudNativePG, Crunchy Postgres, Redis Operator encode HA in CRDs; still your cluster, but less raw YAML.
- Self-run StatefulSet — Best for learning, edge constraints, or platform teams with dedicated DBA/SRE capacity and on-call runbooks.
If you do self-host: encrypt PVCs at rest, restrict RBAC on the data namespace, snapshot on schedule, test restore quarterly, and document RTO/RPO in your incident and DR practice.
GitOps and day-two operations
Store manifests in Git; avoid kubectl edit on StatefulSets in production (GitOps principles). Version image tags; pin major Postgres/Redis versions; test upgrades on a cloned snapshot volume. Record runbooks for: failed readiness after node drain, PVC expansion, password rotation, and breaking glass failover.
Hands-on learning path
- Deploy headless Service + single-replica Postgres; write a row; delete the Pod; confirm data survives on reschedule.
- Inspect PVC name
data-postgres-0and confirm it rebinds to the same PV. - Add Redis with AOF; restart Pod; verify key still exists.
- Scale StatefulSet to 2 without replication config—observe why apps must not treat both as primaries.
- Simulate PVC
Pendingwith wrong StorageClass and fix from Events.
Further reading
- Kubernetes documentation — StatefulSets, StatefulSet Basics
- PostgreSQL documentation — replication, backup, and
pg_isready - Redis documentation — replication, Sentinel, Cluster, persistence
- CNCF operators — CloudNativePG, Redis Operator
Blog index · PV, PVC, StorageClass · Redis deep dive · SQL course · Kubernetes architecture