Partitioning & Sharding • Range-based PartitioningEasy⏱️ ~3 min
What is Range Based Partitioning and How Does It Work?
Definition
Range-based partitioning divides a key space into contiguous, non-overlapping ranges where each range is assigned to a specific shard. Keys within each partition are stored in sorted order. For example, partition 1 holds user IDs 0-999, partition 2 holds 1000-1999, and so on.
⚠️ Common Pitfall: Monotonic keys like timestamps or auto-increment IDs concentrate recent writes in the highest range, creating hotspots. Time-series workloads often prefix keys with hash buckets or reverse timestamps to spread load.
💡 Key Takeaways
✓Keys are stored in sorted order within each partition, enabling efficient range queries through sequential scans rather than random lookups across all partitions.
✓Routing uses a small boundary map that clients cache and search (typically via binary search) to determine which partition owns a given key range.
✓Range queries demonstrate dramatic performance improvements: a time series table showed approximately 17x faster query execution (1.4 milliseconds versus 23.9 milliseconds median) on a 200,000 row dataset through partition pruning.
✓Google Bigtable tablets automatically split when they grow too large or too hot, typically targeting sizes from tens of megabytes to tens of gigabytes depending on workload, maintaining performance at petabyte scale.
✓The core tradeoff is locality versus uniformity: range partitioning optimizes for sequential access and related data colocation but increases hotspot risk compared to hash partitioning.
✓Operational complexity includes monitoring load and size per partition, dynamically splitting hot or large partitions, merging cold or small partitions, and continuously rebalancing to maintain even load distribution.
📌 Interview Tips
1Google Bigtable partitions data into tablets using lexicographically sorted row keys. Each tablet represents a contiguous key range and splits automatically when exceeding size or load thresholds. Clients cache tablet boundaries and refresh on "moved" responses, supporting high throughput services like web indexing with petabyte scale tables.
2MongoDB range sharding uses chunks (contiguous key ranges) with a default size of approximately 64 megabytes. Hot chunks split automatically at midpoints of observed key distributions, and a balancer migrates chunks between shards when distributions diverge, enabling efficient range queries without scatter gather.
3HBase regions represent contiguous key ranges that split when exceeding a target size (commonly around 10 gigabytes). Clusters run with tens of thousands of regions, each served by region servers. Range scans are efficient due to sorted storage and Bloom filters on store files, with automatic splitting and compactions maintaining manageable sizes.