CachingDistributed CachingMedium⏱️ ~3 min

Consistent Hashing, Replication, and Rebalancing in Distributed Caches

Distributed caches partition keys across multiple nodes to scale memory and throughput horizontally. Consistent hashing is the standard technique: each cache node is assigned multiple positions (virtual nodes) on a logical hash ring, and each key is hashed to a position on the ring and assigned to the first node found clockwise. Using virtual nodes (typically 100 to 500 per physical node) ensures even distribution and minimizes data movement when nodes join or leave. When a node is added, it takes over roughly 1 divided by N of the keys from existing nodes, where N is the total node count; when a node fails, its keys are redistributed to the next nodes on the ring. This limits rebalancing impact compared to naive modulo sharding, where adding one node to a 10 node cluster would remap 90 percent of keys, causing massive cache misses and backend load spikes. Replication within the cache provides availability and read fan out. Each partition has a primary node that handles writes and one or two replica nodes that asynchronously receive updates. Reads can be served from any replica, allowing clients to prefer same availability zone replicas to minimize latency and cross zone network costs. Netflix EVCache replicates data across 3 availability zones, so a zone failure loses only one replica and reads automatically shift to the remaining two zones. Write replication is typically asynchronous to keep write latency low (sub millisecond), accepting brief replication lag of milliseconds. For critical data requiring stronger durability, some systems use synchronous replication to at least one replica before acknowledging writes, trading slightly higher write latency (low single digit milliseconds) for reduced data loss risk. Rebalancing when nodes join or leave is a major operational challenge. Adding a new node causes widespread cache misses as keys rehash to the new node, spiking backend load until the cache warms up. Mitigations include gradual traffic shifting, where the new node is marked as not ready and a background process pre fills it with data from existing nodes before it starts serving traffic; sticky client maps that delay updating hash ring membership for seconds to minutes, allowing time for warm up; and using many virtual nodes (300 plus per physical node) to spread the rebalancing load across all nodes rather than concentrating it on neighbors. Meta mcrouter and Netflix EVCache both employ these techniques to add cache capacity without causing user facing latency spikes. At scale, a poorly managed rebalance that suddenly shifts 10 percent of keys to cold nodes can drop hit rate by 10 points and increase database QPS by 100,000 or more, so automated pre warming and gradual rollout are essential production practices.
💡 Key Takeaways
Consistent hashing with virtual nodes (100 to 500 per physical node) minimizes data movement on rebalancing. Adding one node to a 10 node cluster moves only 10 percent of keys instead of 90 percent with naive modulo sharding, limiting cache miss storms.
Replication provides availability and read scalability. Each partition has one primary and one to two replicas with asynchronous replication lag of milliseconds. Reads can fan out to any replica, enabling same availability zone preference to keep latency under 1 ms and avoid cross zone transfer costs.
Netflix EVCache replicates across 3 availability zones so zone failures lose only one replica and traffic automatically shifts to remaining zones. Clients prefer same zone reads to maintain sub millisecond p50 and 1 to 2 ms p99 latency.
Rebalancing storms occur when adding or removing nodes causes widespread cache misses and backend load spikes. A 10 percent sudden key shift can drop hit rate by 10 points and increase database QPS by 100,000 in large systems.
Warm up strategies mitigate rebalancing impact: mark new nodes as not ready, pre fill with data from existing nodes using background transfers, then gradually shift traffic over minutes. Sticky client maps delay hash ring updates to allow warm up time.
Using 300 plus virtual nodes per physical node spreads rebalancing load across all existing nodes instead of concentrating it on ring neighbors, smoothing the backend query spike and reducing per node impact during scaling operations.
📌 Examples
Meta mcrouter uses consistent hashing with virtual nodes for memcache. When adding 10 new cache servers to a 100 server pool, each new server takes approximately 1 percent of keys from existing servers, spread evenly via virtual nodes. Background warming prefills keys before the new servers enter rotation, preventing a 10 percent hit rate drop.
Netflix EVCache 3 zone replication: a cache cluster in us east 1 has nodes in zones 1a, 1b, and 1c. Each key is replicated to one node in each zone. Clients in zone 1a prefer reading from their local replica (under 1 ms latency) but can failover to 1b or 1c replicas (2 to 3 ms cross zone latency) during node failures.
A distributed cache using 200 virtual nodes per physical server has 10 servers (2,000 total positions on the ring). Adding one server creates 200 new positions, each taking approximately 0.5 percent of keys from nearby nodes. Over 10 minutes, background warming fetches these keys from origin before the new server starts serving production traffic.
← Back to Distributed Caching Overview