What is Range Based Partitioning and How Does It Work?

Definition
Range-based partitioning divides a key space into contiguous, non-overlapping ranges where each range is assigned to a specific shard. Keys within each partition are stored in sorted order. For example, partition 1 holds user IDs 0-999, partition 2 holds 1000-1999, and so on.
Why Range Partitioning Enables Efficient Scans:

Because keys are sorted within a partition, queries like "get all events between timestamps t1 and t2" can seek to the start of the range and perform a sequential scan. This provides excellent cache locality and minimizes random I/O operations.

Google Bigtable uses this approach extensively, storing data in lexicographically sorted row keys partitioned into tablets (contiguous key ranges). Tablets split automatically when they grow beyond target sizes (typically 100-200 MB), supporting petabyte-scale tables.

Range vs Hash: The Trade-off

Hash partitioning distributes keys uniformly but destroys natural ordering, making range scans require scatter-gather across all partitions. Range partitioning preserves order to accelerate scans and colocate related data.

⚠️ Common Pitfall: Monotonic keys like timestamps or auto-increment IDs concentrate recent writes in the highest range, creating hotspots. Time-series workloads often prefix keys with hash buckets or reverse timestamps to spread load.

💡 Key Takeaways

✓Keys are stored in sorted order within each partition, enabling efficient range queries through sequential scans rather than random lookups across all partitions.

✓Routing uses a small boundary map that clients cache and search (typically via binary search) to determine which partition owns a given key range.

✓Range queries demonstrate dramatic performance improvements: a time series table showed approximately 17x faster query execution (1.4 milliseconds versus 23.9 milliseconds median) on a 200,000 row dataset through partition pruning.

✓Google Bigtable tablets automatically split when they grow too large or too hot, typically targeting sizes from tens of megabytes to tens of gigabytes depending on workload, maintaining performance at petabyte scale.

✓The core tradeoff is locality versus uniformity: range partitioning optimizes for sequential access and related data colocation but increases hotspot risk compared to hash partitioning.

✓Operational complexity includes monitoring load and size per partition, dynamically splitting hot or large partitions, merging cold or small partitions, and continuously rebalancing to maintain even load distribution.

📌 Interview Tips

1Google Bigtable partitions data into tablets using lexicographically sorted row keys. Each tablet represents a contiguous key range and splits automatically when exceeding size or load thresholds. Clients cache tablet boundaries and refresh on "moved" responses, supporting high throughput services like web indexing with petabyte scale tables.

2MongoDB range sharding uses chunks (contiguous key ranges) with a default size of approximately 64 megabytes. Hot chunks split automatically at midpoints of observed key distributions, and a balancer migrates chunks between shards when distributions diverge, enabling efficient range queries without scatter gather.

3HBase regions represent contiguous key ranges that split when exceeding a target size (commonly around 10 gigabytes). Clusters run with tens of thousands of regions, each served by region servers. Range scans are efficient due to sorted storage and Bloom filters on store files, with automatic splitting and compactions maintaining manageable sizes.

← Back to Range-based Partitioning Overview