KEDA in Depth: Event-Driven Autoscaling for Kubernetes

KEDA (Kubernetes Event-driven Autoscaling) extends the cluster so workloads scale on signals that matter—queue depth, lag, HTTP rate, cloud metrics, cron windows—not only CPU and memory. It is a CNCF graduated project: an operator plus a rich catalog of scalers that feed the same Horizontal Pod Autoscaler machinery Kubernetes already trusts.

In short

Install the KEDA operator → define a ScaledObject (or ScaledJob) pointing at your Deployment and one or more scalers → KEDA computes external metrics and manages an HPA for you → replicas move between minReplicaCount and maxReplicaCount, including scale to zero when idle. Pair with observability, authenticated scalers, and sane cooldowns—CPU-only HPA is not enough for queue- and event-shaped traffic.

Why KEDA exists

Built-in Horizontal Pod Autoscaler (HPA) in Kubernetes is powerful but narrow by default: it scales Deployments, StatefulSets, and some other workload APIs based on resource metrics (CPU, memory) or custom/external metrics you wire yourself. That fits steady HTTP services with correlated CPU—but it breaks down for:

  • Message consumers — Lag or queue depth drives capacity, not processor usage while waiting on I/O.
  • Batch and ETL — Work arrives in bursts; zero replicas overnight saves real money.
  • Integrations — Scale on Azure Service Bus length, AWS SQS depth, GCP Pub/Sub backlog, Kafka consumer lag.
  • Scheduled capacity — Warm replicas before the Monday spike without a human editing replica counts.

KEDA is the control-plane glue: it watches external systems (or time), exposes metrics the metrics pipeline understands, and drives HPAs—so application teams declare “scale this Deployment when lag > 500 per consumer” instead of building bespoke autoscaler sidecars.

If you are new to how the scheduler and kubelet fit together, start with Kubernetes architecture in simple terms. For CPU/memory HPA and kubectl top, see Kubernetes metrics-server in depth. For day-one resource requests and limits that HPA still needs, see Kubernetes hands-on: day-one practices.

KEDA vs HPA vs VPA vs cluster autoscalers

Teams confuse these because all involve “scaling.” They operate at different layers:

Mechanism What moves Typical signal Relationship to KEDA
HPA Pod replicas of a workload CPU, memory, custom/external metrics KEDA creates and owns HPAs for ScaledObjects
VPA CPU/memory requests per Pod Historical usage Orthogonal—right-size containers; does not replace event scaling
Cluster Autoscaler Nodes in the node group Pending Pods, utilization Downstream—KEDA adds Pods; CA must have capacity (or Karpenter provisions nodes)
Karpenter Nodes (provision/deprovision) Pod requirements, consolidation Complements KEDA—fast pod scale + fast node scale
KEDA Pod replicas via HPA + external metrics 70+ scalers (Kafka, Prometheus, cron, cloud queues, …) Event- and metric-driven autoscaling layer

Rule of thumb: KEDA decides how many Pods your service needs; Cluster Autoscaler or Karpenter decides whether the cluster has nodes for those Pods. Both must be healthy for scale-out to succeed—otherwise you see Pending Pods, not higher replica counts.

Architecture: operator, metrics, and the HPA bridge

A standard KEDA install runs in keda namespace (name may vary) with core components:

  • Operator — Watches ScaledObject, ScaledJob, TriggerAuthentication, and related CRDs; creates/updates HPAs; coordinates scaling logic.
  • Metrics adapter — Registers as an external metrics API provider so the kube-controller-manager HPA controller can read KEDA-computed values.
  • Admission webhooks — Validate and default KEDA resources on create/update.

End-to-end flow:

  1. You create a ScaledObject referencing a target (e.g. Deployment/orders-consumer) and one or more triggers (scaler configs).
  2. KEDA’s operator instantiates or updates an HPA whose name is derived from the ScaledObject (managed lifecycle—do not hand-edit that HPA).
  3. On each polling interval, KEDA asks each scaler implementation for a metric value (queue length, lag, Prometheus query result, etc.).
  4. Multiple triggers combine via aggregation (default: take the maximum desired replica count across triggers—tunable).
  5. The metrics adapter exposes the composite metric; the HPA compares it to thresholds and patches the target workload’s spec.replicas.
External system          KEDA operator           Kubernetes
 (Kafka, SQS, …)              │                      │
       │                      │                      │
       └──► scaler poll ─────►│──► external metric ─►│ HPA controller
                              │                      │      │
                              │                      └──► Deployment replicas

Core CRDs you actually use

ScaledObject — long-running workloads

The workhorse. Binds to a scale target (apiVersion/kind/name—usually a Deployment) and declares minReplicaCount, maxReplicaCount, optional idleReplicaCount (scale-to-zero), polling interval, cooldown periods, and triggers.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: orders-consumer
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: orders-consumer
  pollingInterval: 30          # seconds between metric checks
  cooldownPeriod: 300          # wait after last trigger active before scale-in
  minReplicaCount: 0           # scale to zero allowed
  maxReplicaCount: 40
  idleReplicaCount: 0          # replicas when scalers report "idle"
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.eu-west-1.amazonaws.com/123456789012/orders
        queueLength: "50"      # target messages per replica
        awsRegion: eu-west-1
      authenticationRef:
        name: aws-credentials
    - type: cpu
      metricType: Utilization
      metadata:
        value: "70"            # optional safety cap alongside queue scaler

Important fields teams overlook:

  • pollingInterval — Lower = faster reaction, more API calls to external systems (watch rate limits on cloud APIs).
  • cooldownPeriod — Prevents flapping when lag oscillates around the threshold.
  • advanced.horizontalPodAutoscalerConfig — Pass HPA behavior (scale-up/down stabilization windows) when you need finer control than defaults.
  • fallback — Define replica count or failure policy when scalers error (see below).

ScaledJob — batch and finite work

For Jobs that should spin up when work exists (SQS messages, RabbitMQ queues, Kafka topics). KEDA creates Job objects up to maxReplicaCount parallel jobs, each running your template. When the queue drains, Jobs complete and nothing stays running—ideal for cost-sensitive batch paths.

Use ScaledObject for always-on consumers; ScaledJob for “process N messages per Job pod” patterns.

TriggerAuthentication and ClusterTriggerAuthentication

Scalers need credentials: Kafka SASL, AWS IAM, Azure connection strings, Prometheus bearer tokens. Store references in Secrets; point triggers at a TriggerAuthentication (namespace-scoped) or ClusterTriggerAuthentication (shared).

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: aws-credentials
  namespace: production
spec:
  secretTargetRef:
    - parameter: awsAccessKeyID
      name: keda-aws-secret
      key: AWS_ACCESS_KEY_ID
    - parameter: awsSecretAccessKey
      name: keda-aws-secret
      key: AWS_SECRET_ACCESS_KEY
  podIdentity:
    provider: aws-eks   # prefer IRSA / workload identity over long-lived keys

Production habit: prefer cloud pod identity (EKS IRSA, GKE workload identity, Azure Workload ID) over static keys in Secrets. Rotate and scope IAM policies to read-only queue metrics and consume permissions only.

CloudEventSource and ClusterCloudEventSource

Advanced integrations that react to cloud event feeds (e.g. storage notifications) to drive scaling or downstream automation—less common than ScaledObject but useful for event-native platforms.

The scaler catalog (and how to think about triggers)

KEDA ships 70+ built-in scalers—each is a adapter from an external metric to “desired replicas.” Categories that appear in almost every platform team’s cluster:

Category Examples (type:) What it measures
Message queues aws-sqs-queue, azure-servicebus, gcp-pubsub, rabbitmq, nats-jetstream Queue depth, unacked messages, subscription backlog
Streaming kafka, pulsar, redis-streams Consumer lag, partition offset lag
Observability prometheus, datadog, new-relic Any PromQL or vendor metric you already trust
HTTP / mesh prometheus (ingress RPS), istio (via metrics) Request rate, latency proxies
Time cron Scale out before business hours; scale in after
Kubernetes resources cpu, memory Combine event + resource ceilings in one ScaledObject
CI / ops github-runner, gitlab-runner Runner queue depth for self-hosted agents

Each scaler documents required metadata keys (threshold per replica, activation threshold, TLS flags, etc.) in the official scaler reference. Misconfigured metadata is the #1 reason “KEDA does nothing.”

Activation vs scaling threshold

Many scalers distinguish:

  • Activation — Metric must cross this bar before KEDA scales from idle/min (wakes the workload).
  • Scaling threshold — Target per replica used in the replica formula once active.

Example: Kafka lag activation at 10 stops cold-start churn; scaling threshold 100 lag per replica drives steady-state capacity.

Multiple triggers on one ScaledObject

Common pattern: aws-sqs-queue + cpu on the same Deployment. KEDA aggregates desired replicas—default policy takes the maximum so neither queue backlog nor CPU spike is ignored. You can also combine cron (minimum replicas during business hours) with Prometheus RPS.

Scale to zero and cold starts

KEDA’s signature capability is minReplicaCount: 0 with supported scalers: when external metrics say “idle,” replicas drop to zero (or idleReplicaCount). That saves node hours for dev namespaces, sporadic integrators, and weekend-quiet pipelines.

Trade-offs you must accept:

  • Cold start latency — First message after idle pays image pull + JVM warmup + scaler poll interval.
  • Activation delay — One polling window before HPA sees non-zero metric.
  • Downstream timeouts — Producers must retry or buffer if consumers are asleep.

Mitigations: keep minReplicaCount: 1 in production for latency-sensitive paths; use cron trigger to pre-warm; tune pollingInterval lower for critical queues; ensure Cluster Autoscaler/Karpenter can add nodes quickly when scale-from-zero creates Pending Pods.

Installation and day-one verification

Official path is Helm from the KEDA charts repo; pin chart version to match your Kubernetes minor version (check release notes for deprecated API removals).

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
  --namespace keda --create-namespace \
  --version 2.14.0   # example — use current stable for your cluster

kubectl get pods -n keda
kubectl get crd | grep keda

Verify metrics API registration:

kubectl get apiservice v1beta1.external.metrics.k8s.io -o yaml
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | head

On managed clusters (EKS, GKE, AKS), KEDA is often installed by platform teams alongside GitOps (Argo CD Application or Flux HelmRelease). Application teams only commit ScaledObjects in their namespaces.

Worked examples

Kafka consumer lag

triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.prod.svc:9092
      consumerGroup: orders-processor
      topic: orders
      lagThreshold: "200"
      activationLagThreshold: "20"
    authenticationRef:
      name: kafka-sasl

Ensure your consumer Deployment actually uses the same consumerGroup and topic set—KEDA reads broker metadata; the app must participate in that group.

Prometheus (custom SLO or RPS)

triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring:9090
      metricName: http_requests_per_second
      query: sum(rate(http_requests_total{service="checkout"}[2m]))
      threshold: "100"
    authenticationRef:
      name: prom-basic-auth

Powerful and dangerous: a bad PromQL query or cardinality explosion hurts Prometheus and scaling. Use recording rules and tested queries; cap evaluation frequency via pollingInterval.

Cron + business hours

triggers:
  - type: cron
    metadata:
      timezone: Europe/Oslo
      start: 0 8 * * 1-5
      end: 0 18 * * 1-5
      desiredReplicas: "5"

Keeps five replicas during weekday working hours regardless of queue depth—combine with a queue scaler via max aggregation for “never below 5 by day, burst above 5 when lag spikes.”

Fallback, health, and failure modes

When a scaler cannot reach AWS, Kafka, or Prometheus, KEDA can enter a fallback state—if configured, maintain a safe replica count instead of scaling to zero or leaving stale HPA targets.

spec:
  fallback:
    failureThreshold: 3
    replicas: 3
  triggers:
    - type: aws-sqs-queue
      # ...

Operational signals to watch:

  • ScaledObject status conditions — Ready, Active, Fallback, Paused
  • HPA events — kubectl describe hpa keda-hpa-...
  • Operator logs — authentication failures, rate limits, invalid metadata

Pause scaling during migrations with annotation autoscaling.keda.sh/paused-replicas or the paused field on the ScaledObject—document runbooks so on-call does not delete the ScaledObject “to stop flapping.”

Security, RBAC, and multi-tenancy

KEDA’s operator holds cluster-wide permissions to manage HPAs and read external metrics configuration. Platform concerns:

  • Namespace isolation — Grant application teams permission to create ScaledObject only in their namespaces; restrict ClusterTriggerAuthentication to platform admins.
  • Secrets hygiene — Prefer workload identity; never commit cloud keys beside ScaledObjects in Git—use External Secrets Operator or sealed secrets (see GitOps principles).
  • Network policy — Operator must reach Prometheus, Kafka bootstrap, cloud APIs—egress allowlists should include KEDA’s namespace.
  • Admission policy — OPA Gatekeeper/Kyverno policies validating max replica caps, forbidden scaler types in prod, or required labels for cost allocation.

For who may create ScaledObjects in a cluster, align with Kubernetes cluster RBAC patterns—separate platform Role from tenant Role.

Observability and FinOps

KEDA exposes Prometheus metrics from the operator (scaler errors, metric values, scaled object counts). Dashboard ideas:

  • Current vs desired replicas per ScaledObject
  • Scaler poll failures and fallback activations
  • Correlation: queue lag ↓ while replica count ↑ (proves scaling helps)
  • Node count and pending Pods after scale-from-zero events

FinOps angle: scale-to-zero and right-sized max replicas save more than chasing CPU-based HPA alone—especially for non-production namespaces and async workers. Pair with chargeback labels (app.kubernetes.io/part-of, environment) so finance sees which ScaledObjects drive node growth. For broader cost culture, see the FinOps/GreenOps series.

Production checklist

Check Why it matters
Resource requests set on target Pods HPA v2 needs metrics; schedulers need requests for node placement
maxReplicaCount bounded Prevents runaway scaling if metric misconfigured
Activation threshold tuned Stops zero↔one flapping on noise
Cooldown and HPA behavior set Smooth scale-in; avoid draining all consumers mid-batch
Cluster has headroom (CA/Karpenter) KEDA scales Pods; something must place them
Graceful termination + preStop Scale-in must not kill in-flight messages
Idempotent consumers Scale-in and duplicates happen—design for at-least-once
Fallback configured for prod scalers Cloud API outage should not zero critical workers
GitOps owns ScaledObject YAML Drift between queue config and scaler metadata is a common outage

Troubleshooting playbook

  1. ScaledObject not Readykubectl describe scaledobject <name>; fix trigger metadata, auth Secret, or network path to broker.
  2. Replicas stuck at min — Metric below activation threshold; wrong consumer group; empty queue; Prometheus query returns zero.
  3. Replicas stuck at max — Lag never drains (slow consumers); threshold too low; need more partitions/consumers per replica formula review.
  4. HPA exists but Unknown metrics — Metrics adapter not registered; APIService degraded; version skew between KEDA and cluster.
  5. Flapping — Increase cooldownPeriod; widen activation gap; add HPA stabilization; fix too-aggressive scale-in on Deployment.
  6. Pending Pods after scale-out — Node capacity, quotas, affinity—not KEDA—debug with Kubernetes troubleshooting playbook.
kubectl get scaledobject -A
kubectl describe scaledobject orders-consumer -n production
kubectl get hpa -n production
kubectl describe hpa keda-hpa-orders-consumer -n production
kubectl logs -n keda deploy/keda-operator --tail=100

Common pitfalls

  • Hand-editing the HPA KEDA owns — Changes get reconciled away; edit the ScaledObject instead.
  • Scale to zero without retry-aware producers — Timeouts and DLQ floods follow.
  • One global polling interval for all scalers — Aggressive polls against paid cloud APIs or fragile brokers.
  • CPU-only mental model — Queue workers sit idle at low CPU while lag explodes.
  • No max cap — Typo in queueLength or PromQL can request hundreds of replicas.
  • Ignoring graceful shutdown — Kubernetes sends SIGTERM; consumers must commit offsets and finish in-flight work within terminationGracePeriodSeconds.

How KEDA fits your platform stack

In mature platform engineering setups, KEDA sits beside—not instead of—other ecosystem pieces you may already run:

  • GitOps (Argo CD / Flux) — ScaledObjects versioned like Deployments; promotion across envs with Kustomize overlays for min/max and scaler endpoints.
  • Service mesh (Istio/Linkerd) — Traffic metrics still often come from Prometheus; mesh does not replace queue-based scalers.
  • Karpenter — Rapid node provisioning when KEDA adds Pods faster than Cluster Autoscaler reacts.
  • CI/CD — Deploy new consumer version; KEDA continues to reference same Deployment name—ensure rollouts respect min available if HPA scales during RollingUpdate.

KEDA is graduated in the CNCF—it is a safe bet for vendor-neutral, event-driven autoscaling on Kubernetes. The implementation detail that sticks: you declare intent in ScaledObjects; KEDA owns the HPAs and external metrics wiring.

Further reading

Blog index · Kubernetes architecture · CRI and CSI · Cluster RBAC · Troubleshooting playbook · GitOps principles

Back to blog list