Load Balancing • Load Balancing AlgorithmsMedium⏱️ ~3 min
Power of Two Choices: Least Requests Algorithm
The power of two choices algorithm elegantly solves a fundamental problem: how do you approximate global least load decisions with only local state and constant time overhead? The algorithm is deceptively simple. For each incoming request, randomly sample two backend servers, check their in flight request counts, and route to whichever has fewer active requests. Ties break by configured weight. You maintain per backend counters that increment on dispatch and decrement on completion, requiring no centralized coordination or distributed state propagation.
The mathematics behind this approach are remarkable. In the classic balls into bins problem, throwing n balls uniformly at random into n bins yields a maximum load of roughly log n divided by log log n with high probability. Adding just one extra random choice (pick two bins, choose the less loaded) drops the maximum load to log log n. For 1,000 servers, this transforms worst case load from around 145 requests to about 3 requests, a 50x improvement. In production systems with bursty traffic and heavy tailed service times, this translates directly to dramatically lower p99 latency.
Envoy proxy, used widely in service meshes at companies like Microsoft and Google, implements this as the "least_request" load balancer. Configuration allows tuning the choice count (typically 2 or 3) and incorporating weights for heterogeneous capacity. The algorithm scales linearly with load balancer count since each proxy makes independent local decisions with no coordination overhead.
The failure mode to watch is metric staleness with very short requests. If request completion signals lag by even 10 to 20 milliseconds and requests complete in under 5 milliseconds, multiple concurrent routing decisions see stale counters and pile onto the same backend. The mitigation is ensuring tight completion signaling loops, which modern proxies achieve through efficient event driven architectures that update counters synchronously in the dispatch thread.
💡 Key Takeaways
•Algorithm: Sample 2 random backends, route to the one with fewer in flight requests. Maintain local counters incremented on dispatch and decremented on completion, requiring zero distributed state
•Mathematical foundation: Power of two choices reduces maximum load from log n divided by log log n to log log n. For 1,000 servers this means worst case drops from approximately 145 to 3 requests
•Envoy proxy implements this as "least_request" mode with configurable choice count (2 or 3) and optional weights, used in production service meshes at Google and Microsoft
•Scales linearly with load balancer count since each proxy makes independent decisions with constant time overhead per request
•Failure mode: Metric staleness matters for very short requests under 5 milliseconds. If counter updates lag by 10 to 20 milliseconds, concurrent decisions see stale data and pile onto same backend
📌 Examples
Envoy configuration: listener with least_request load balancer, choice_count of 2, active_request_bias of 1.0, and optional slow_start_window of 60 seconds for new instances
Kubernetes service mesh: 1,000 backend pods handling 50,000 RPS. With random routing, unlucky pods hit 200+ concurrent requests. Power of two choices keeps max at 60 to 80 concurrent, reducing p99 latency from 800ms to 150ms
Microsoft production data: Switching from round robin to power of two choices in their service fabric reduced tail latency by 45% for heterogeneous workloads with coefficient of variation above 2.0