Load Balancing • Load Balancing AlgorithmsMedium⏱️ ~3 min
Consistent Hashing for Cache Locality and Stickiness
Consistent hashing solves a critical problem in distributed caching and stateful routing: how do you map keys to servers such that adding or removing a server remaps only a minimal fraction of keys? Naive modulo hashing (hash(key) mod N) remaps almost all keys when N changes, causing massive cache misses and state migration. Consistent hashing with a hash ring remaps only approximately 1 divided by N of keys when you add one server to N existing servers.
The algorithm maps both servers and keys onto a circular hash space (typically 0 to 2^32). Each server hashes to a position on the ring. To route a key, hash it to a position and walk clockwise until you hit the first server. When you add a new server, only keys that fall between it and the previous server clockwise get remapped. When you remove a server, its keys move to the next server clockwise. Virtual nodes enhance distribution: each physical server maps to 100 to 200 positions on the ring by hashing concatenations like "server1_0", "server1_1" through "server1_199". This smooths load across servers and bounds the imbalance to roughly plus or minus 5 to 10%.
Google Maglev uses consistent hashing to distribute flows across backend servers, enabling any Maglev load balancer instance to make the same routing decision for a given flow without coordination. AWS Elastic Load Balancing (ELB) uses flow hashing at Layer 4 to maintain connection stickiness. The key production tradeoff is hotspot vulnerability: if one key (say, a celebrity profile or viral video) dominates traffic, it pins to a single backend regardless of load. This contrasts with dynamic algorithms that would detect the overload and shed traffic.
Failure modes include NAT and mobile network concentration. IP based consistent hashing breaks when thousands of users behind carrier grade NAT (CGNAT) collapse to 2 to 5 public IP addresses. Example: 100,000 RPS from a mobile carrier maps to 2 IPs, overloading 2 backends with 50,000 RPS each while others idle. Mitigation requires Layer 7 (L7) cookie based stickiness or incorporating user ID into the hash, and replication where each key maps to the top k servers from the hash ring and you pick the least loaded among them.
💡 Key Takeaways
•Naive modulo hashing remaps nearly 100% of keys when server count changes. Consistent hashing remaps only 1 divided by N keys when adding one server to N servers, minimizing cache misses
•Virtual nodes (100 to 200 per physical server) smooth distribution and bound imbalance to plus or minus 5 to 10%, preventing hot spots from uneven hash distribution
•Google Maglev uses flow consistent hashing to enable any load balancer instance to route the same flow identically without centralized state, achieving sub second failover
•Hotspot vulnerability: Popular keys (celebrity profiles, viral videos) pin to single backend regardless of load. Example: 100,000 RPS from mobile carrier CGNAT collapses to 2 public IPs, overloading 2 backends with 50,000 RPS each
•Mitigation strategies: Layer 7 cookie stickiness instead of IP hash, replication factor k where keys map to top k servers on ring and choose least loaded, or rendezvous hashing with bounded load
📌 Examples
Google Maglev: Each Maglev instance maintains identical hash ring over backend VIPs. Flow arrives via anycast to any Maglev, which applies consistent hash over 5 tuple and forwards to backend, enabling stateless failover
AWS Network Load Balancer: Uses flow hash (source IP, port, dest IP, port, protocol) to maintain stickiness for TCP/UDP flows, critical for stateful protocols like QUIC and gaming
CDN cache routing: Consistent hash over URL maps each object to cache server. Adding 1 cache to 100 node tier remaps only 1% of objects, avoiding thundering herd to origin on topology changes