Message Queues & StreamingConsumer Groups & Load BalancingHard⏱️ ~3 min

Scaling Policies and Observability: Lag Metrics, Autoscaling, and SLOs

Effective autoscaling for consumer groups requires lag based metrics, not just CPU or memory. A consumer at 20 percent CPU may be fully utilized if it is processing only 2 of 10 assigned partitions due to hot partition skew. The leading metric is per partition lag (current offset minus committed offset) and lag growth rate (change in lag per unit time). Alert when any single partition has lag greater than threshold (for example, 10000 records or 5 minutes of data) or when lag growth rate is positive for more than 2 consecutive measurement intervals. Compute required consumers from input rate and per consumer throughput. If total input is 100000 records per second and steady state per consumer throughput is 2000 records per second, you need ceiling of 100000 divided by 2000 equals 50 consumers. For backlog clearance, calculate catch up capacity: if you have 10 million records of backlog and your Service Level Objective (SLO) requires clearing it within 1 hour, you need ceiling of 10000000 divided by (3600 times extra throughput per consumer) additional consumers. With 1000 extra records per second per consumer, that is ceiling of 10000000 divided by 3600000 equals 3 extra consumers. Remember the hard cap: scaling beyond partition count leaves consumers idle. Monitor per partition metrics, not just group aggregates. Track lag, lag growth rate, consumer throughput (records per second and bytes per second), fetch latency, processing latency, and rebalance frequency and duration. Correlate with downstream dependency latencies (database write time, HTTP API call duration) that throttle consumers. For example, if database write latency spikes from 5 milliseconds to 50 milliseconds, consumer throughput drops proportionally and lag accumulates even though consumer code and infrastructure are healthy. Set SLOs on time to zero lag relative to retention and business requirements. If retention is 7 days and business requires data freshness within 1 hour, your SLO might be "P99 lag less than 10 minutes" or "time to zero lag after backlog less than 30 minutes". Alert on violations and trigger automated scaling or throttling. Use these metrics to drive capacity planning: if you consistently run at 70 percent of partition capacity, you have 30 percent headroom for spikes; if you run at 95 percent, you risk violating SLOs on any transient slowdown.
💡 Key Takeaways
Lag is the leading metric for autoscaling: per partition lag (current minus committed offset) and lag growth rate (change in lag per measurement interval) drive scaling decisions better than CPU or memory.
Required consumers equals ceiling of total input rate divided by per consumer throughput: 100000 records per second input with 2000 records per second per consumer requires 50 consumers, capped by partition count.
Backlog clearance requires catch up capacity: 10 million record backlog with 1 hour SLO and 1000 extra records per second per consumer needs 3 additional consumers (ceiling of 10000000 divided by 3600000).
Monitor per partition metrics, not just aggregates: a group with average lag of 1000 may have 3 partitions at 15000 lag (hot partition failure) hidden by 97 partitions at near zero lag.
Correlate consumer lag with downstream dependency latencies: if database write latency increases from 5 milliseconds to 50 milliseconds, consumer throughput drops by 10 times even with healthy infrastructure, causing lag accumulation.
Set SLOs on time to zero lag relative to retention: if retention is 7 days and business requires 1 hour freshness, target P99 lag less than 10 minutes and time to zero lag after backlog less than 30 minutes.
📌 Examples
A consumer group with 100 partitions runs 70 consumers at 70 percent per consumer utilization, providing 30 percent headroom for traffic spikes; scaling to 95 consumers reduces headroom to 5 percent and risks SLA violations on transient slowdowns.
A monitoring system detects lag growth rate positive for 3 consecutive 60 second intervals on 5 partitions; it triggers autoscaling to add 5 consumers and alerts on potential hot partition skew for investigation.
A payments pipeline has 7 day retention and 1 hour freshness SLO; it sets alerts for per partition lag greater than 10 minutes or time to zero lag greater than 30 minutes, and triggers autoscaling when either threshold is crossed.
← Back to Consumer Groups & Load Balancing Overview
Scaling Policies and Observability: Lag Metrics, Autoscaling, and SLOs | Consumer Groups & Load Balancing - System Overflow