KEDA in Depth: Event-Driven Autoscaling for Kubernetes
KEDA (Kubernetes Event-driven Autoscaling) extends the cluster so workloads scale on signals that matter—queue depth, lag, HTTP rate, cloud metrics, cron windows—not only CPU and memory. It is a CNCF graduated project: an operator plus a rich catalog of scalers that feed the same Horizontal Pod Autoscaler machinery Kubernetes already trusts.
In short
Install the KEDA operator → define a ScaledObject (or ScaledJob) pointing at your Deployment and one or more scalers → KEDA computes external metrics and manages an HPA for you → replicas move between minReplicaCount and maxReplicaCount, including scale to zero when idle. Pair with observability, authenticated scalers, and sane cooldowns—CPU-only HPA is not enough for queue- and event-shaped traffic.
Why KEDA exists
Built-in Horizontal Pod Autoscaler (HPA) in Kubernetes is powerful but narrow by default: it scales Deployments, StatefulSets, and some other workload APIs based on resource metrics (CPU, memory) or custom/external metrics you wire yourself. That fits steady HTTP services with correlated CPU—but it breaks down for:
- Message consumers — Lag or queue depth drives capacity, not processor usage while waiting on I/O.
- Batch and ETL — Work arrives in bursts; zero replicas overnight saves real money.
- Integrations — Scale on Azure Service Bus length, AWS SQS depth, GCP Pub/Sub backlog, Kafka consumer lag.
- Scheduled capacity — Warm replicas before the Monday spike without a human editing replica counts.
KEDA is the control-plane glue: it watches external systems (or time), exposes metrics the metrics pipeline understands, and drives HPAs—so application teams declare “scale this Deployment when lag > 500 per consumer” instead of building bespoke autoscaler sidecars.
If you are new to how the scheduler and kubelet fit together, start with Kubernetes architecture in simple terms. For CPU/memory HPA and kubectl top, see Kubernetes metrics-server in depth. For day-one resource requests and limits that HPA still needs, see Kubernetes hands-on: day-one practices.
KEDA vs HPA vs VPA vs cluster autoscalers
Teams confuse these because all involve “scaling.” They operate at different layers:
| Mechanism | What moves | Typical signal | Relationship to KEDA |
|---|---|---|---|
| HPA | Pod replicas of a workload | CPU, memory, custom/external metrics | KEDA creates and owns HPAs for ScaledObjects |
| VPA | CPU/memory requests per Pod | Historical usage | Orthogonal—right-size containers; does not replace event scaling |
| Cluster Autoscaler | Nodes in the node group | Pending Pods, utilization | Downstream—KEDA adds Pods; CA must have capacity (or Karpenter provisions nodes) |
| Karpenter | Nodes (provision/deprovision) | Pod requirements, consolidation | Complements KEDA—fast pod scale + fast node scale |
| KEDA | Pod replicas via HPA + external metrics | 70+ scalers (Kafka, Prometheus, cron, cloud queues, …) | Event- and metric-driven autoscaling layer |
Rule of thumb: KEDA decides how many Pods your service needs; Cluster Autoscaler or Karpenter decides whether the cluster has nodes for those Pods. Both must be healthy for scale-out to succeed—otherwise you see Pending Pods, not higher replica counts.
Architecture: operator, metrics, and the HPA bridge
A standard KEDA install runs in keda namespace (name may vary) with core components:
- Operator — Watches
ScaledObject,ScaledJob,TriggerAuthentication, and related CRDs; creates/updates HPAs; coordinates scaling logic. - Metrics adapter — Registers as an external metrics API provider so the kube-controller-manager HPA controller can read KEDA-computed values.
- Admission webhooks — Validate and default KEDA resources on create/update.
End-to-end flow:
- You create a
ScaledObjectreferencing a target (e.g.Deployment/orders-consumer) and one or more triggers (scaler configs). - KEDA’s operator instantiates or updates an HPA whose name is derived from the ScaledObject (managed lifecycle—do not hand-edit that HPA).
- On each polling interval, KEDA asks each scaler implementation for a metric value (queue length, lag, Prometheus query result, etc.).
- Multiple triggers combine via aggregation (default: take the maximum desired replica count across triggers—tunable).
- The metrics adapter exposes the composite metric; the HPA compares it to thresholds and patches the target workload’s
spec.replicas.
External system KEDA operator Kubernetes
(Kafka, SQS, …) │ │
│ │ │
└──► scaler poll ─────►│──► external metric ─►│ HPA controller
│ │ │
│ └──► Deployment replicas
Core CRDs you actually use
ScaledObject — long-running workloads
The workhorse. Binds to a scale target (apiVersion/kind/name—usually a Deployment) and declares minReplicaCount, maxReplicaCount, optional idleReplicaCount (scale-to-zero), polling interval, cooldown periods, and triggers.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: orders-consumer
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: orders-consumer
pollingInterval: 30 # seconds between metric checks
cooldownPeriod: 300 # wait after last trigger active before scale-in
minReplicaCount: 0 # scale to zero allowed
maxReplicaCount: 40
idleReplicaCount: 0 # replicas when scalers report "idle"
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.eu-west-1.amazonaws.com/123456789012/orders
queueLength: "50" # target messages per replica
awsRegion: eu-west-1
authenticationRef:
name: aws-credentials
- type: cpu
metricType: Utilization
metadata:
value: "70" # optional safety cap alongside queue scaler
Important fields teams overlook:
pollingInterval— Lower = faster reaction, more API calls to external systems (watch rate limits on cloud APIs).cooldownPeriod— Prevents flapping when lag oscillates around the threshold.advanced.horizontalPodAutoscalerConfig— Pass HPA behavior (scale-up/down stabilization windows) when you need finer control than defaults.fallback— Define replica count or failure policy when scalers error (see below).
ScaledJob — batch and finite work
For Jobs that should spin up when work exists (SQS messages, RabbitMQ queues, Kafka topics). KEDA creates Job objects up to maxReplicaCount parallel jobs, each running your template. When the queue drains, Jobs complete and nothing stays running—ideal for cost-sensitive batch paths.
Use ScaledObject for always-on consumers; ScaledJob for “process N messages per Job pod” patterns.
TriggerAuthentication and ClusterTriggerAuthentication
Scalers need credentials: Kafka SASL, AWS IAM, Azure connection strings, Prometheus bearer tokens. Store references in Secrets; point triggers at a TriggerAuthentication (namespace-scoped) or ClusterTriggerAuthentication (shared).
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: aws-credentials
namespace: production
spec:
secretTargetRef:
- parameter: awsAccessKeyID
name: keda-aws-secret
key: AWS_ACCESS_KEY_ID
- parameter: awsSecretAccessKey
name: keda-aws-secret
key: AWS_SECRET_ACCESS_KEY
podIdentity:
provider: aws-eks # prefer IRSA / workload identity over long-lived keys
Production habit: prefer cloud pod identity (EKS IRSA, GKE workload identity, Azure Workload ID) over static keys in Secrets. Rotate and scope IAM policies to read-only queue metrics and consume permissions only.
CloudEventSource and ClusterCloudEventSource
Advanced integrations that react to cloud event feeds (e.g. storage notifications) to drive scaling or downstream automation—less common than ScaledObject but useful for event-native platforms.
The scaler catalog (and how to think about triggers)
KEDA ships 70+ built-in scalers—each is a adapter from an external metric to “desired replicas.” Categories that appear in almost every platform team’s cluster:
| Category | Examples (type:) |
What it measures |
|---|---|---|
| Message queues | aws-sqs-queue, azure-servicebus, gcp-pubsub, rabbitmq, nats-jetstream |
Queue depth, unacked messages, subscription backlog |
| Streaming | kafka, pulsar, redis-streams |
Consumer lag, partition offset lag |
| Observability | prometheus, datadog, new-relic |
Any PromQL or vendor metric you already trust |
| HTTP / mesh | prometheus (ingress RPS), istio (via metrics) |
Request rate, latency proxies |
| Time | cron |
Scale out before business hours; scale in after |
| Kubernetes resources | cpu, memory |
Combine event + resource ceilings in one ScaledObject |
| CI / ops | github-runner, gitlab-runner |
Runner queue depth for self-hosted agents |
Each scaler documents required metadata keys (threshold per replica, activation threshold, TLS flags, etc.) in the official scaler reference. Misconfigured metadata is the #1 reason “KEDA does nothing.”
Activation vs scaling threshold
Many scalers distinguish:
- Activation — Metric must cross this bar before KEDA scales from idle/min (wakes the workload).
- Scaling threshold — Target per replica used in the replica formula once active.
Example: Kafka lag activation at 10 stops cold-start churn; scaling threshold 100 lag per replica drives steady-state capacity.
Multiple triggers on one ScaledObject
Common pattern: aws-sqs-queue + cpu on the same Deployment. KEDA aggregates desired replicas—default policy takes the maximum so neither queue backlog nor CPU spike is ignored. You can also combine cron (minimum replicas during business hours) with Prometheus RPS.
Scale to zero and cold starts
KEDA’s signature capability is minReplicaCount: 0 with supported scalers: when external metrics say “idle,” replicas drop to zero (or idleReplicaCount). That saves node hours for dev namespaces, sporadic integrators, and weekend-quiet pipelines.
Trade-offs you must accept:
- Cold start latency — First message after idle pays image pull + JVM warmup + scaler poll interval.
- Activation delay — One polling window before HPA sees non-zero metric.
- Downstream timeouts — Producers must retry or buffer if consumers are asleep.
Mitigations: keep minReplicaCount: 1 in production for latency-sensitive paths; use cron trigger to pre-warm; tune pollingInterval lower for critical queues; ensure Cluster Autoscaler/Karpenter can add nodes quickly when scale-from-zero creates Pending Pods.
Installation and day-one verification
Official path is Helm from the KEDA charts repo; pin chart version to match your Kubernetes minor version (check release notes for deprecated API removals).
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
--namespace keda --create-namespace \
--version 2.14.0 # example — use current stable for your cluster
kubectl get pods -n keda
kubectl get crd | grep keda
Verify metrics API registration:
kubectl get apiservice v1beta1.external.metrics.k8s.io -o yaml
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | head
On managed clusters (EKS, GKE, AKS), KEDA is often installed by platform teams alongside GitOps (Argo CD Application or Flux HelmRelease). Application teams only commit ScaledObjects in their namespaces.
Worked examples
Kafka consumer lag
triggers:
- type: kafka
metadata:
bootstrapServers: kafka.prod.svc:9092
consumerGroup: orders-processor
topic: orders
lagThreshold: "200"
activationLagThreshold: "20"
authenticationRef:
name: kafka-sasl
Ensure your consumer Deployment actually uses the same consumerGroup and topic set—KEDA reads broker metadata; the app must participate in that group.
Prometheus (custom SLO or RPS)
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring:9090
metricName: http_requests_per_second
query: sum(rate(http_requests_total{service="checkout"}[2m]))
threshold: "100"
authenticationRef:
name: prom-basic-auth
Powerful and dangerous: a bad PromQL query or cardinality explosion hurts Prometheus and scaling. Use recording rules and tested queries; cap evaluation frequency via pollingInterval.
Cron + business hours
triggers:
- type: cron
metadata:
timezone: Europe/Oslo
start: 0 8 * * 1-5
end: 0 18 * * 1-5
desiredReplicas: "5"
Keeps five replicas during weekday working hours regardless of queue depth—combine with a queue scaler via max aggregation for “never below 5 by day, burst above 5 when lag spikes.”
Fallback, health, and failure modes
When a scaler cannot reach AWS, Kafka, or Prometheus, KEDA can enter a fallback state—if configured, maintain a safe replica count instead of scaling to zero or leaving stale HPA targets.
spec:
fallback:
failureThreshold: 3
replicas: 3
triggers:
- type: aws-sqs-queue
# ...
Operational signals to watch:
ScaledObjectstatus conditions —Ready,Active,Fallback,Paused- HPA events —
kubectl describe hpa keda-hpa-... - Operator logs — authentication failures, rate limits, invalid metadata
Pause scaling during migrations with annotation autoscaling.keda.sh/paused-replicas or the paused field on the ScaledObject—document runbooks so on-call does not delete the ScaledObject “to stop flapping.”
Security, RBAC, and multi-tenancy
KEDA’s operator holds cluster-wide permissions to manage HPAs and read external metrics configuration. Platform concerns:
- Namespace isolation — Grant application teams permission to create
ScaledObjectonly in their namespaces; restrictClusterTriggerAuthenticationto platform admins. - Secrets hygiene — Prefer workload identity; never commit cloud keys beside ScaledObjects in Git—use External Secrets Operator or sealed secrets (see GitOps principles).
- Network policy — Operator must reach Prometheus, Kafka bootstrap, cloud APIs—egress allowlists should include KEDA’s namespace.
- Admission policy — OPA Gatekeeper/Kyverno policies validating max replica caps, forbidden scaler types in prod, or required labels for cost allocation.
For who may create ScaledObjects in a cluster, align with Kubernetes cluster RBAC patterns—separate platform Role from tenant Role.
Observability and FinOps
KEDA exposes Prometheus metrics from the operator (scaler errors, metric values, scaled object counts). Dashboard ideas:
- Current vs desired replicas per ScaledObject
- Scaler poll failures and fallback activations
- Correlation: queue lag ↓ while replica count ↑ (proves scaling helps)
- Node count and pending Pods after scale-from-zero events
FinOps angle: scale-to-zero and right-sized max replicas save more than chasing CPU-based HPA alone—especially for non-production namespaces and async workers. Pair with chargeback labels (app.kubernetes.io/part-of, environment) so finance sees which ScaledObjects drive node growth. For broader cost culture, see the FinOps/GreenOps series.
Production checklist
| Check | Why it matters |
|---|---|
| Resource requests set on target Pods | HPA v2 needs metrics; schedulers need requests for node placement |
maxReplicaCount bounded |
Prevents runaway scaling if metric misconfigured |
| Activation threshold tuned | Stops zero↔one flapping on noise |
| Cooldown and HPA behavior set | Smooth scale-in; avoid draining all consumers mid-batch |
| Cluster has headroom (CA/Karpenter) | KEDA scales Pods; something must place them |
Graceful termination + preStop |
Scale-in must not kill in-flight messages |
| Idempotent consumers | Scale-in and duplicates happen—design for at-least-once |
| Fallback configured for prod scalers | Cloud API outage should not zero critical workers |
| GitOps owns ScaledObject YAML | Drift between queue config and scaler metadata is a common outage |
Troubleshooting playbook
- ScaledObject not Ready —
kubectl describe scaledobject <name>; fix trigger metadata, auth Secret, or network path to broker. - Replicas stuck at min — Metric below activation threshold; wrong consumer group; empty queue; Prometheus query returns zero.
- Replicas stuck at max — Lag never drains (slow consumers); threshold too low; need more partitions/consumers per replica formula review.
- HPA exists but Unknown metrics — Metrics adapter not registered; APIService degraded; version skew between KEDA and cluster.
- Flapping — Increase
cooldownPeriod; widen activation gap; add HPA stabilization; fix too-aggressive scale-in on Deployment. - Pending Pods after scale-out — Node capacity, quotas, affinity—not KEDA—debug with Kubernetes troubleshooting playbook.
kubectl get scaledobject -A
kubectl describe scaledobject orders-consumer -n production
kubectl get hpa -n production
kubectl describe hpa keda-hpa-orders-consumer -n production
kubectl logs -n keda deploy/keda-operator --tail=100
Common pitfalls
- Hand-editing the HPA KEDA owns — Changes get reconciled away; edit the ScaledObject instead.
- Scale to zero without retry-aware producers — Timeouts and DLQ floods follow.
- One global polling interval for all scalers — Aggressive polls against paid cloud APIs or fragile brokers.
- CPU-only mental model — Queue workers sit idle at low CPU while lag explodes.
- No max cap — Typo in
queueLengthor PromQL can request hundreds of replicas. - Ignoring graceful shutdown — Kubernetes sends SIGTERM; consumers must commit offsets and finish in-flight work within
terminationGracePeriodSeconds.
How KEDA fits your platform stack
In mature platform engineering setups, KEDA sits beside—not instead of—other ecosystem pieces you may already run:
- GitOps (Argo CD / Flux) — ScaledObjects versioned like Deployments; promotion across envs with Kustomize overlays for min/max and scaler endpoints.
- Service mesh (Istio/Linkerd) — Traffic metrics still often come from Prometheus; mesh does not replace queue-based scalers.
- Karpenter — Rapid node provisioning when KEDA adds Pods faster than Cluster Autoscaler reacts.
- CI/CD — Deploy new consumer version; KEDA continues to reference same Deployment name—ensure rollouts respect min available if HPA scales during
RollingUpdate.
KEDA is graduated in the CNCF—it is a safe bet for vendor-neutral, event-driven autoscaling on Kubernetes. The implementation detail that sticks: you declare intent in ScaledObjects; KEDA owns the HPAs and external metrics wiring.
Further reading
- KEDA documentation — concepts, deploy guides, operations
- Scaler catalog — metadata reference per integration
- Kubernetes HPA — underlying autoscaler behavior and v2 metrics
- KEDA security — RBAC and threat model
- CNCF KEDA project page — maturity and ecosystem
Blog index · Kubernetes architecture · CRI and CSI · Cluster RBAC · Troubleshooting playbook · GitOps principles