Load BalancingLoad Balancing AlgorithmsEasy⏱️ ~2 min

Static vs Dynamic Load Balancing Algorithms

Definition
Load balancing algorithms determine how incoming requests are distributed across backend servers. They fall into two fundamental categories: static algorithms that make decisions without live feedback, and dynamic algorithms that react to current server state using real-time metrics.

Static Algorithms

Static algorithms like round robin, weighted round robin, and hashing make decisions without live feedback from backend servers. Round robin cycles through servers sequentially: request 1 goes to server A, request 2 to server B, request 3 to server C, then back to A. Hashing maps requests to servers based on a hash of the request (IP address, URL, or session ID), ensuring the same input always routes to the same server. Weighted variants distribute proportionally to assigned capacities. These algorithms require minimal coordination and add negligible overhead (< 1ms), making them ideal when servers are homogeneous and request processing times are uniform.

Dynamic Algorithms

Dynamic algorithms like least connections, least requests, and least response time react to current server state. They track queue lengths, in-flight requests, or recent latency to avoid sending traffic to overloaded servers. The trade-off is complexity: you need accurate, timely metrics and more control plane communication. Dynamic algorithms shine with variable workloads and heterogeneous fleets where some servers are slower due to different CPU generations, cold caches, or noisy neighbors in shared infrastructure.

Impact on Tail Latency

The choice dramatically affects tail latency under real conditions. Consider a 10-instance cluster handling 15,000 RPS where nine instances support 2,000 RPS but one degraded instance only handles 1,000 RPS. Round robin sends 1,500 RPS to each server. The degraded instance queue grows at 500 RPS, quickly causing timeouts and pushing p99 latency beyond 10 seconds, even though total cluster capacity (19,000 RPS) exceeds demand. Least requests would detect the growing queue and shift load away, preventing the cascade.

Choosing the Right Algorithm

Round robin: Use when servers are identical and requests have uniform processing time. Simple, predictable, zero coordination overhead. Weighted round robin: Use when servers have known, stable capacity differences (e.g., different CPU counts). Least connections/requests: Use when request processing times vary significantly or servers have different capacities that change over time. Reduces tail latency by 40-60% in heterogeneous fleets. Hashing: Use when you need cache locality or session stickiness, accepting hotspot risk.

Key Trade-off: Static algorithms have zero coordination overhead but cannot adapt to runtime conditions. Dynamic algorithms reduce tail latency by 40-60% in heterogeneous fleets but require fresh metrics and add complexity. The right choice depends on workload uniformity and fleet heterogeneity.
💡 Key Takeaways
Static algorithms (round robin, hashing) use zero live feedback with < 1ms overhead; ideal for homogeneous servers with uniform request times
Dynamic algorithms (least connections, least requests) track server state; reduce tail latency 40-60% in heterogeneous fleets but need fresh metrics
Failure example: 10 servers at 15K RPS with one degraded server; round robin sends 1500 RPS to 1000 RPS capacity server, queue grows, p99 exceeds 10s
Decision: round robin for identical servers, weighted for known stable differences, least requests for variable workloads
📌 Interview Tips
1Walk through the degraded server example: 9 servers at 2000 RPS, 1 at 1000 RPS, round robin sends 1500 to each, degraded queue grows at 500 RPS
2Explain when to choose each algorithm: homogeneous fleet (round robin), known capacity differences (weighted), variable processing (least requests)
3Mention that dynamic algorithms reduce p99 latency by 40-60% in heterogeneous environments
← Back to Load Balancing Algorithms Overview