Static vs Dynamic Load Balancing Algorithms
Static Algorithms
Static algorithms like round robin, weighted round robin, and hashing make decisions without live feedback from backend servers. Round robin cycles through servers sequentially: request 1 goes to server A, request 2 to server B, request 3 to server C, then back to A. Hashing maps requests to servers based on a hash of the request (IP address, URL, or session ID), ensuring the same input always routes to the same server. Weighted variants distribute proportionally to assigned capacities. These algorithms require minimal coordination and add negligible overhead (< 1ms), making them ideal when servers are homogeneous and request processing times are uniform.
Dynamic Algorithms
Dynamic algorithms like least connections, least requests, and least response time react to current server state. They track queue lengths, in-flight requests, or recent latency to avoid sending traffic to overloaded servers. The trade-off is complexity: you need accurate, timely metrics and more control plane communication. Dynamic algorithms shine with variable workloads and heterogeneous fleets where some servers are slower due to different CPU generations, cold caches, or noisy neighbors in shared infrastructure.
Impact on Tail Latency
The choice dramatically affects tail latency under real conditions. Consider a 10-instance cluster handling 15,000 RPS where nine instances support 2,000 RPS but one degraded instance only handles 1,000 RPS. Round robin sends 1,500 RPS to each server. The degraded instance queue grows at 500 RPS, quickly causing timeouts and pushing p99 latency beyond 10 seconds, even though total cluster capacity (19,000 RPS) exceeds demand. Least requests would detect the growing queue and shift load away, preventing the cascade.
Choosing the Right Algorithm
Round robin: Use when servers are identical and requests have uniform processing time. Simple, predictable, zero coordination overhead. Weighted round robin: Use when servers have known, stable capacity differences (e.g., different CPU counts). Least connections/requests: Use when request processing times vary significantly or servers have different capacities that change over time. Reduces tail latency by 40-60% in heterogeneous fleets. Hashing: Use when you need cache locality or session stickiness, accepting hotspot risk.