Consistent Hashing and Virtual Nodes for Stable Partition Mapping

The naive approach of hash(key) mod N creates a catastrophic problem: when you add or remove a node (changing N), nearly all keys remap to different partitions, triggering massive data movement and cache invalidation. Consistent hashing solves this by placing nodes on a conceptual hash ring (typically a 64 bit or 128 bit space) and assigning each key to the first node encountered clockwise from hash(key). When a node joins or leaves, only the keys between that node and its predecessor move, representing roughly 1/N of the keyspace. This minimizes disruption during topology changes, which is critical for systems that must maintain availability during rolling upgrades or autoscaling events.

Virtual nodes enhance consistent hashing by giving each physical node many positions on the ring rather than just one. Amazon Dynamo paper describes physical nodes owning many virtual tokens to smooth distribution and allow capacity weighted assignment. If node A has twice the capacity of node B, you assign node A twice as many virtual nodes on the ring. This solves two problems: First, with only a few physical nodes, consistent hashing can produce uneven load because nodes are randomly placed. Virtual nodes average out this variance statistically. Second, when a node fails, its load redistributes across many other nodes instead of overloading a single successor. Production systems typically use 100 to 1,000 virtual nodes per physical node to achieve coefficient of variation below 5 percent for load distribution.

Alternative stable hashing schemes include rendezvous hashing (highest random weight) and jump consistent hashing. Rendezvous hashing computes a score for every candidate node using hash(key, node_id) and picks the highest scoring node, providing O(nodes) lookup but perfect stability and no need for ring metadata. Jump consistent hashing achieves O(1) lookup with minimal key movement but requires sequential node numbering and cannot easily handle heterogeneous capacities. Google Maglev uses a variant of consistent hashing optimized for load balancers, generating a fixed lookup table of 65,537 entries that can be updated independently by each load balancer instance without coordination. Choose consistent hashing with virtual nodes for general purpose distributed storage, rendezvous for client side routing with small node counts, and jump hashing for homogeneous clusters prioritizing lookup speed.

💡 Key Takeaways

✓Minimal key movement on topology change: When adding or removing one node from an N node cluster, only approximately 1/N of keys remap. Amazon Dynamo uses this property to rebalance shopping cart data with minimal disruption during node failures.

✓Virtual nodes smooth distribution and enable capacity weighting: Using 100 to 1,000 virtual nodes per physical node reduces load variance to under 5 percent coefficient of variation and allows proportional assignment based on node capacity.

✓Replication placement must avoid correlated failures: In ring based hashing, replicas can land on adjacent nodes in the same rack or availability zone. Production systems enforce placement constraints to spread replicas across failure domains.

✓Rendezvous hashing provides O(nodes) lookup with perfect stability: Compute hash(key, node_id) for every node and pick maximum score. No ring metadata needed, ideal for client side routing with 10 to 100 nodes.

✓Jump consistent hashing achieves O(1) mapping with minimal movement: Uses mathematical formula to compute partition in constant time but requires sequential node numbering and struggles with heterogeneous capacity or arbitrary removal.

✓Google Maglev uses lookup table approach: Generates 65,537 entry table per load balancer using weighted permutation, providing consistent hashing benefits with simple array lookup and independent computation per instance.

📌 Interview Tips

1Amazon Dynamo paper describes consistent hashing with virtual nodes for shopping cart storage. Each physical node owns many tokens on the ring. When a node fails, its keys redistribute across many surviving nodes instead of overloading one successor, maintaining availability.

2Apache Cassandra uses Murmur3 hash with virtual nodes (default 256 vnodes per node). Adding a new node to a 10 node cluster moves roughly 1/11 of data with tokens distributed across all existing nodes, allowing gradual rebalancing without hotspots.

3Facebook memcache clusters use consistent hashing to route cache keys. When a cache server fails, only keys on that server miss; other cached data remains valid. Virtual nodes ensure even load distribution across heterogeneous hardware.

4Jump consistent hashing in Go: bucket := 0; key := hash(key); for j := 1; j < numBuckets; j++ { if rand(key, j) < 1.0/float64(j+1) { bucket = j } }. This achieves minimal key movement but only works with sequential bucket addition.

← Back to Hash-based Partitioning Overview