Load BalancingLoad Balancing AlgorithmsMedium⏱️ ~3 min

Consistent Hashing for Cache Locality and Stickiness

The Problem Consistent Hashing Solves

Consistent hashing solves a critical problem in distributed caching and stateful routing: how do you map keys to servers such that adding or removing a server remaps only a minimal fraction of keys? With naive modulo hashing (hash(key) mod N), changing N from 10 to 11 servers remaps roughly 90% of keys, causing massive cache misses and state migration. Consistent hashing remaps only ~1/N of keys when adding one server to N existing servers.

How the Hash Ring Works

The algorithm maps both servers and keys onto a circular hash space (typically 0 to 2^32). Each server hashes to a position on the ring. To route a key, hash it to a position and walk clockwise until you hit the first server. When you add a new server, only keys that fall between it and the previous server clockwise get remapped. When you remove a server, its keys move to the next server clockwise. This locality property is what makes consistent hashing valuable: changes affect only neighboring keys, not the entire keyspace.

Virtual Nodes for Even Distribution

With only physical server positions, load distribution can be uneven. If servers hash to positions that are not uniformly spaced, some servers own larger arc segments and receive disproportionate traffic. Virtual nodes fix this: each physical server maps to 100-200 positions on the ring by hashing concatenations like "server1_0", "server1_1" through "server1_199". This smooths load across servers and bounds imbalance to roughly ±5-10%. More virtual nodes improve distribution but increase memory for the ring lookup table.

Hotspot Vulnerability

The key production trade-off is hotspot vulnerability. If one key (a celebrity profile, viral video, or popular API endpoint) dominates traffic, it pins to a single backend regardless of load. Unlike dynamic algorithms that detect overload and shed traffic, consistent hashing deterministically routes that key to the same server. Additionally, IP-based consistent hashing breaks when thousands of users behind carrier-grade NAT (CGNAT) collapse to 2-5 public IP addresses. Example: 100,000 RPS from a mobile carrier maps to 2 IPs, overloading 2 backends with 50,000 RPS each while others idle.

Mitigation Strategies

Bounded load hashing: Each key maps to top k servers on the ring; route to least loaded among them. Preserves locality while avoiding hotspots. Layer 7 cookie affinity: Instead of IP hash, use user ID or session cookie, distributing CGNAT users across backends. Rendezvous hashing: Alternative to ring-based; compute hash for each (key, server) pair, pick server with highest hash. Simpler, no virtual nodes needed, but O(N) per lookup versus O(log N) for ring with binary search.

Key Trade-off: Consistent hashing provides cache locality and minimal key remapping (1/N on server change) but deterministically routes hot keys to single backends. Use bounded load hashing or replication factor k to combine locality benefits with load awareness.
💡 Key Takeaways
Modulo hashing remaps ~90% of keys when N changes; consistent hashing remaps only 1/N keys, minimizing cache misses
Hash ring: servers and keys map to circular space; keys route to next clockwise server; changes affect only neighboring keys
Virtual nodes (100-200 per server) smooth distribution and bound imbalance to ±5-10%; more nodes improve distribution but increase memory
Hotspot vulnerability: popular keys pin to single backend; CGNAT collapses 100K RPS from mobile carrier to 2 IPs, overloading 2 servers
📌 Interview Tips
1Explain the math: modulo hash with N=10 to N=11 remaps ~90% of keys; consistent hashing remaps only ~10%
2Describe virtual nodes: server1 maps to server1_0 through server1_199, creating 200 ring positions for even distribution
3Walk through CGNAT hotspot: 100K mobile users behind 2 public IPs, each IP hashes to one server, 50K RPS per server while others idle
← Back to Load Balancing Algorithms Overview