Load BalancingLoad Balancing AlgorithmsHard⏱️ ~3 min

Production Failure Modes and Mitigation Strategies

Hotspot Concentration Through Hashing

Load balancing algorithms fail in subtle ways under real-world conditions. Hotspot concentration through hashing is a common trap. When thousands of users behind carrier-grade NAT (CGNAT) share a handful of public IPs, IP-based consistent hashing overloads a few backends. A mobile carrier with 100,000 users might present only 2-5 public IP addresses, causing those backends to receive 20,000-50,000 RPS while others sit idle. Mitigation: L7 cookie-based affinity or incorporating user ID into hash keys provides better distribution than IP hashing.

Metric Staleness and Oscillation

Metric staleness causes oscillation with dynamic algorithms. If load metrics propagate every 5-10 seconds and routing decisions use stale data, backends flip between appearing idle and overloaded. Multiple proxies see the same "idle" backend simultaneously and flood it. Then they see it overloaded and abandon it, creating cycles. The backend oscillates between 0% and 200% capacity. Fix: Power of two choices with local per-proxy state (each proxy tracks its own in-flight counts) and EMA smoothing (Exponential Moving Average with 2-5 second half-life) dampens oscillation.

Retry Storm Amplification

Retry storms amplify under sticky algorithms. When a backend becomes slow (garbage collection pause, disk stall), clients retry. With consistent hashing, retries hit the same struggling backend, making the problem worse. The backend queue grows, causing more timeouts, triggering more retries in a positive feedback loop. Production mitigation requires: retry budgets limiting cluster-wide retries to 1.5-2x base traffic, jittered exponential backoff randomizing retry timing to avoid thundering herd, and retry diversification where retries deliberately choose different backends than the original attempt.

HTTP/2 Connection Counting Problem

HTTP/2 connection counting breaks least connections at Layer 4. A single HTTP/2 connection can multiplex 100 concurrent requests. An L4 load balancer counting TCP connections sees one connection and considers that backend lightly loaded, while the backend actually handles 100 requests. This causes severe imbalance: one backend with 10 HTTP/2 connections handles 1,000 requests while another with 10 short-lived HTTP/1.1 connections handles 10 requests, yet both appear equally loaded. Fix: L7 awareness to count concurrent streams, or limit max streams per connection to 10-20 forcing clients to open multiple connections.

Long-Lived Connection Imbalance

WebSocket or gRPC connections lasting hours pin to backends via flow hash. Scaling adds backends but existing connections do not rebalance. If you have 100 long-lived connections distributed across 10 backends (10 each) and scale to 20 backends, the original 10 still have 10 connections each while the new 10 have zero until organic connection churn redistributes traffic, which may take hours. Mitigation: connection draining windows of 60-300 seconds on scale events, or client-side periodic reconnect every 30-60 minutes.

Key Insight: Load balancing algorithms fail under conditions that simple models miss: CGNAT concentration, metric staleness oscillation, retry amplification, HTTP/2 stream counting, and long-lived connection skew. Each requires specific mitigation strategies, not just algorithm tuning.
💡 Key Takeaways
CGNAT hotspot: 100K mobile users collapse to 2-5 IPs, overloading 2-5 backends with 20-50K RPS each while others idle
Metric staleness oscillation: 5-10s propagation delay causes proxies to flood then abandon backends; fix with local state and EMA smoothing
Retry amplification: sticky hashing sends retries to same struggling backend; require retry budgets (1.5-2x base), jitter, and diversification
HTTP/2 counting: 10 connections with 100 streams each = 1000 requests; L4 sees 10 connections (light load); L7 sees 1000 streams
📌 Interview Tips
1Walk through CGNAT hotspot: mobile carrier, 100K users, 2 public IPs, each IP hashes to one server, 50K RPS per server
2Explain retry storm: slow backend causes timeout, retry hits same backend (sticky), queue grows, more timeouts, positive feedback loop
3Describe HTTP/2 imbalance: backend A has 10 HTTP/2 connections (1000 requests), backend B has 10 HTTP/1.1 connections (10 requests), both appear equal at L4
← Back to Load Balancing Algorithms Overview