Load BalancingSticky SessionsMedium⏱️ ~2 min

Capacity Planning and Load Imbalance: The Operational Cost of Stickiness

Breaking Uniform Distribution

Sticky sessions fundamentally break the uniform load distribution assumption that underlies traditional capacity planning. In a stateless system, adding N instances increases capacity by exactly N times the per-instance throughput. Traffic immediately spreads evenly. With sticky sessions, this assumption fails because existing sessions remain bound to their original servers. New instances receive only new sessions, creating a capacity ramp-up period equal to the session TTL.

The Scale-Out Math

If your affinity TTL is 20 minutes and you scale out at time zero, new instances will be underutilized for approximately 20 minutes while old instances remain at peak load. If sessions arrive uniformly, new instances might handle only 15-30% of their capacity in the first 10 minutes. Effective capacity during scale-out is only 60-80% of theoretical for the first TTL duration. Scaling from 3 to 6 instances does not double capacity immediately; you might only reach 200,000 RPS instead of 300,000 RPS for 15-20 minutes.

Power User Hotspots

Beyond scale-out delays, traffic is inherently non-uniform. Some users make 100 requests per session, others make 2. Long-running sessions accumulate on the servers that happened to receive them. At 50,000 RPS with median session length of 15 minutes, a three-instance cluster handling uniform traffic would see roughly 16,667 RPS per instance. With sticky sessions and realistic traffic patterns, the busiest instance often handles 25,000 RPS while the least busy handles 10,000 RPS, an imbalance ratio of 2.5x.

Capacity Planning Heuristics

Provision 20-30% extra headroom beyond your stateless capacity model to absorb skew and failover events. If your target median CPU utilization is 50% for stateless workloads, plan for 40% with sticky sessions to leave room for hotspots to spike to 70-80% without triggering alerts or degradation. This extra headroom is the hidden cost of sticky sessions that capacity planners often miss.

Key Metrics to Monitor

Monitor the imbalance ratio: max(metric) / mean(metric) for CPU, RPS, and memory across instances. Sustained ratios above 1.5-2.0 indicate you need better session distribution, shorter TTLs, or more aggressive load shedding on hot instances. Scale-in is equally complex: draining requires 1-2x the session TTL to avoid dropping active sessions, delaying capacity reclamation by 20-60 minutes.

Key Trade-off: Sticky sessions deliver only 60-80% of theoretical capacity during scale-out for the first TTL duration (10-30 minutes). Plan for 20-30% extra headroom and target 40% median CPU instead of 50% to absorb inevitable hotspots.
💡 Key Takeaways
Scale-out delivers only 60-80% of theoretical capacity for first TTL duration; new instances receive only new sessions
Imbalance ratios of 1.5-2.5x are common; busiest instance may handle 2.5x the RPS of least busy due to power users
Plan 20-30% extra headroom; target 40% median CPU instead of 50% to absorb hotspots spiking to 70-80%
Monitor max/mean ratio for CPU and RPS; sustained > 1.5-2.0 indicates need for shorter TTLs or better distribution
📌 Interview Tips
1Walk through scale-out math: 20-minute TTL means new instances at 15-30% capacity for first 10 minutes
2Calculate imbalance: 50K RPS across 3 instances should be 16.7K each; with sticky sessions, ranges from 10K to 25K (2.5x ratio)
3Explain scale-in delay: draining requires 1-2x session TTL to avoid dropping active sessions
← Back to Sticky Sessions Overview