Power of Two Choices: Least Requests Algorithm

The Core Problem
The power of two choices algorithm solves a fundamental problem: how do you approximate global least-load decisions with only local state and constant-time overhead? A naive approach would query all backends for their current load, but with 1,000 backends this adds latency and creates a coordination bottleneck. The insight is that comparing just two random choices provides nearly optimal distribution with zero coordination.
The Algorithm
For each incoming request: randomly sample two backend servers, check their in-flight request counts, and route to whichever has fewer active requests. Ties break by configured weight. The load balancer maintains per-backend counters that increment on dispatch and decrement on completion. This requires no centralized coordination or distributed state propagation. Each load balancer instance operates independently using only its local view of in-flight requests.
The Mathematics
The mathematics are remarkable. In the classic balls-into-bins problem, throwing n balls uniformly at random into n bins yields a maximum load of roughly log(n) / log(log(n)) with high probability. Adding just one extra random choice (pick two bins, choose the less loaded) drops maximum load to log(log(n)). For 1,000 servers, this transforms worst-case load from around 145 requests to about 3 requests, a 50x improvement. In production with bursty traffic and heavy-tailed service times, this translates directly to dramatically lower p99 latency.
Scaling Properties
The algorithm scales linearly with load balancer count since each proxy makes independent local decisions with no coordination overhead. Adding more load balancers does not increase contention or require state synchronization. You can tune the choice count (typically 2 or 3) and incorporate weights for heterogeneous capacity. Choosing 3 instead of 2 provides marginal improvement but increases overhead; 2 choices capture most of the benefit.
Failure Mode: Metric Staleness
The failure mode to watch is metric staleness with very short requests. If request completion signals lag by 10-20ms and requests complete in under 5ms, multiple concurrent routing decisions see stale counters and pile onto the same backend. The mitigation: ensure tight completion signaling loops. Modern proxies achieve this through efficient event-driven architectures that update counters synchronously in the dispatch thread, keeping staleness under 1ms.
Key Insight: The power of two choices achieves near-optimal load distribution with O(1) overhead per request and zero coordination between load balancers. The mathematical proof shows that one extra random choice provides exponentially better distribution than pure random selection.

💡 Key Takeaways

✓Algorithm: sample 2 random backends, route to one with fewer in-flight requests; counters increment on dispatch, decrement on completion

✓Mathematics: worst-case load drops from log(n)/log(log(n)) to log(log(n)); for 1000 servers, maximum load drops from ~145 to ~3 requests (50x improvement)

✓Scales linearly with load balancer count; each proxy makes independent decisions with O(1) overhead per request

✓Failure mode: metric staleness with sub-5ms requests; 10-20ms lag causes pile-on; fix with synchronous counter updates

📌 Interview Tips

1Explain the mathematical insight: one extra random choice (2 vs 1) provides exponential improvement in load distribution

2Walk through the numbers: 1000 servers, random assignment worst-case ~145 requests, two choices worst-case ~3 requests

3Describe the staleness failure mode: fast requests complete before counters update, causing concurrent decisions to pile onto same backend

← Back to Load Balancing Algorithms Overview