When Not to Use Consistent Hashing: Range Queries and Small N

Consistent hashing is optimized for point lookups (get key X) and uniform key distributions. If your workload requires range queries (get all keys from A to Z) or prefix scans (all keys starting with user123), range partitioning is superior. Range partitioning keeps lexicographically adjacent keys on the same node, enabling efficient scans. Google Bigtable and Apache HBase use range partitioning with split and merge operations; scanning a million consecutive keys hits one or a few nodes instead of scattering across the entire cluster.

For very small node counts (5 or fewer backends), the complexity of consistent hashing often outweighs its benefits. A simple static table or even modulo hashing with manual remap during changes can be simpler and faster. Google Maglev uses precomputed tables rather than runtime consistent hashing precisely because lookups at packet rates demand O(1) with minimal instructions. If your cluster size is stable for weeks at a time and changes are rare, coordinated manual resharding may be more predictable than automatic consistent hashing.

If your keys are not uniformly distributed or have known hotspots, deliberate manual assignment or application level sharding can outperform automatic hashing. For example, sharding users by geographic region (US East users to cluster A, Europe users to cluster B) leverages data locality and regulatory boundaries that no hash function respects. Hybrid approaches exist: use range partitioning within a region and consistent hashing across regions.

Finally, if node membership churns extremely rapidly (multiple nodes joining and leaving per second), consistent hashing still helps but coordination overhead dominates. In such cases, consider a single routing tier with strongly consistent membership state (like a load balancer with health checks) rather than distributed client side hashing. The routing tier can use consistent hashing internally but shields clients from churn.

💡 Key Takeaways

✓Range queries require range partitioning: Scanning keys A to Z with consistent hashing hits all nodes; range partitioning (Bigtable, HBase) keeps adjacent keys co-located, enabling single node scans

✓Small N simplicity: With 5 or fewer backends, static table or modulo hashing simpler; Maglev uses precomputed tables for O(1) packet rate lookups instead of runtime consistent hashing

✓Non-uniform distributions: Geographic sharding (US East vs Europe clusters) or tenant based sharding respects locality and compliance boundaries that hash functions ignore

✓High churn environments: Multiple nodes per second joining/leaving makes client side hashing coordination overhead prohibitive; central routing tier with health checks more practical

✓Hybrid approaches: Use range partitioning within regions for scan efficiency, consistent hashing across regions for elasticity; combine strengths of both strategies

📌 Interview Tips

1Google Bigtable: Range partitioning with tablets (key ranges); scanning 1M consecutive rows hits one tablet server vs 1000 nodes with consistent hashing

2Pinterest shard routing: Users sharded by user ID modulo with manual migration windows; predictable data locality for graph traversals vs hash based random placement

3Cloudflare Railgun: Static server pools with manual assignment for enterprise customers; customer specific optimizations impossible with automatic hashing

4Uber Schemaless: MySQL shards with range partitioning on timestamp for time series queries; consistent hashing would scatter adjacent timestamps across all shards

← Back to Consistent Hashing Overview