Hash Based Assignment Strategies: Consistent and Rendezvous Hashing

Hash based assignment strategies map keys or partitions to nodes mathematically, enabling predictable rebalancing without centralized coordination. Consistent hashing with virtual nodes is the most widely adopted approach: each physical node is assigned K virtual tokens (typically 64 to 256) positioned on a hash ring. Keys hash to ring positions and are assigned to the first virtual node encountered clockwise. When you add one node to an N node cluster, roughly 1/N of keys remap because the new node claims portions of the ring from existing nodes. For a 100 node ring storing 1 PB total (10 TB per node), adding 10 nodes moves approximately 100 TB in aggregate rather than reshuffling everything.

Virtual nodes solve the load distribution problem that naive consistent hashing suffers from. With only one token per physical node, adding or removing nodes causes wildly uneven load distribution. With 128 virtual nodes per physical node, load variance drops dramatically: standard deviation of load typically stays within 5% to 10% of mean. Weighted virtual nodes extend this further for heterogeneous clusters. A node with double the CPU and memory gets double the virtual tokens, so it naturally receives twice the traffic. Amazon Dynamo pioneered this approach and it remains the foundation of DynamoDB's partition assignment today.

Rendezvous hashing (also called Highest Random Weight hashing) offers an alternative without ring structures. For each key, compute hash of (key, node) for every node and assign the key to the node with highest hash value, multiplied by a capacity weight. This makes capacity adjustments trivial: simply update node weights and keys automatically rebalance proportionally. LinkedIn and other systems use rendezvous hashing when dynamic capacity changes are frequent, as it avoids maintaining complex ring metadata.

The tradeoff is metadata complexity versus flexibility. Consistent hashing requires maintaining and distributing ring topology to all nodes; with 1,000 nodes and 128 virtual tokens each, that is 128,000 ring positions to track. Rendezvous hashing needs only node addresses and weights but requires O of N hash computations per lookup where N is node count. For small clusters (under 100 nodes), rendezvous is simpler. For large clusters (1,000 plus nodes) where lookups happen millions of times per second, the precomputed ring structure of consistent hashing wins despite metadata overhead.

💡 Key Takeaways

✓Consistent hashing with virtual nodes limits rebalancing to approximately 1/N of data when adding one node to an N node cluster, compared to full reshuffling with simple hashing

✓Virtual nodes dramatically reduce load imbalance: 128 tokens per physical node keeps load variance within 5% to 10% of mean versus 30% or more with single tokens

✓Weighted virtual nodes enable capacity aware placement by assigning proportionally more tokens to larger nodes: a 2x capacity node receives 2x virtual tokens and naturally handles 2x traffic

✓Rendezvous hashing simplifies capacity changes by avoiding ring maintenance but requires O of N hash computations per lookup, making it better for small clusters under 100 nodes

✓Metadata overhead matters at scale: 1,000 nodes with 128 virtual tokens each means tracking 128,000 ring positions that must be distributed and synchronized across the cluster

✓Amazon Dynamo and DynamoDB use consistent hashing with virtual nodes as the foundation for partition assignment, balancing metadata complexity against predictable, minimal data movement

📌 Interview Tips

1Amazon DynamoDB ring with 100 nodes storing 1 PB (10 TB per node): adding 10 new nodes with equal capacity moves approximately 100 TB total (10 TB per new node) rather than reorganizing the entire petabyte

2Cluster using 128 virtual tokens per node experiences node failure: only 1/100th of data (roughly 10 GB per remaining node) needs redistribution across a 100 node cluster, minimizing blast radius

3LinkedIn system with 50 heterogeneous nodes uses weighted rendezvous hashing: 32 core nodes get weight 2.0, 16 core nodes get weight 1.0, automatically doubling traffic to larger instances without manual partition management

← Back to Rebalancing Strategies Overview