Partitioning & Sharding • Range-based PartitioningHard⏱️ ~3 min
Production Tradeoffs: When to Choose Range vs Hash Partitioning
Selecting between range based and hash based partitioning requires analyzing your workload's access patterns, consistency requirements, and operational maturity. Range partitioning excels when range queries or scans dominate your workload, data naturally groups by ordered keys (time series, user sessions, geographic regions), or you need to colocate related entities for transactional or join efficiency. The partition pruning benefit is substantial: time filtered queries on range partitioned tables routinely show 10x to 20x speedups compared to full table scans, as demonstrated by micro benchmarks showing 17x improvement (1.4 milliseconds vs 23.9 milliseconds) on a modest 200,000 row dataset. Analytics workloads processing time windows, log aggregation systems, and applications with sequential access patterns gain major efficiency from sequential I/O and cache locality.
However, range partitioning demands significantly higher operational sophistication. You must design keys to avoid monotonic hotspots, implement robust auto splitting and merging logic, maintain versioned boundary maps with client side caching, and continuously monitor per partition metrics to prevent load imbalance. Google Spanner's placement driver continuously tracks CPU, disk, and throughput per split, splitting and rebalancing to keep resource usage within target envelopes, but this requires substantial engineering investment. Systems that are write heavy with skewed distributions are particularly challenging: a poorly designed key schema can concentrate 80+ percent of writes on a single hot partition despite having dozens of underutilized partitions. The metadata layer becomes critical path for availability, and rebalancing complexity increases background I/O overhead and network traffic during migrations.
Hash partitioning offers simpler operations and better write distribution at the cost of range query efficiency. Keys are uniformly distributed through cryptographic hashing, eliminating natural hotspots and minimizing rebalancing frequency. Point lookups remain efficient (single partition), but any range query requires scatter gather across all partitions, multiplying latency and resource consumption. Choose hash partitioning when point lookups dominate (user profile lookups, session stores, key value caches), keys are highly skewed or unpredictable, or you want to minimize operational complexity and hotspot risk. Hybrid approaches split the difference: hash prefix the key (to distribute writes) then range partition within each hash bucket, enabling bounded range scans per bucket. For example, "hash(user_id) % 100#timestamp" spreads writes across 100 buckets while preserving per user time range queries within a single bucket. This pattern appears throughout Google systems for write intensive time series workloads.
💡 Key Takeaways
•Range partitioning delivers 10x to 20x query speedups for time filtered or range bounded queries through partition pruning, but only when queries align with partition key boundaries.
•Operational complexity is significantly higher for range partitioning: requires key design expertise, dynamic splitting and merging automation, versioned routing metadata, and continuous per partition monitoring (CPU, QPS, size, latency).
•Hash partitioning eliminates hotspot risk and simplifies operations but destroys range query efficiency, requiring scatter gather across all partitions and multiplying query latency by partition count.
•Hybrid hash then range patterns (for example, "hash(entity) % N#timestamp") blend write distribution with bounded range scans, trading perfect range efficiency for N way parallelism and hotspot protection.
•Google production systems heavily use range partitioning with aggressive key design: reverse timestamp components, add entity prefixes for locality, or bucket monotonic keys to distribute hot writes while preserving per entity range scans.
•Cost of poor key design in range partitioning is severe: systems with monotonic timestamp keys routinely see 80+ percent of write load concentrated on a single hot partition despite dozens of idle partitions, requiring emergency rekeying or salting to restore balance.
📌 Examples
A financial trading application stores tick data with composite keys "exchange#symbol#timestamp". Range partitioning enables efficient queries like "get all trades for NASDAQ AAPL between 10:30 and 11:00 AM", scanning only the relevant partition. Hash partitioning would require querying all partitions and filtering in memory, increasing query latency from 15 milliseconds to 200+ milliseconds with 20 partitions.
A social media feed system using range partitioned user timelines encountered severe hotspots: celebrity accounts with millions of followers concentrated writes on their partition. Switching to hash partitioning by user ID eliminated hotspots but broke follower range queries. Final hybrid solution: "hash(user_id) % 256#post_timestamp" distributed celebrity writes across 256 shards while enabling efficient recent post queries per shard.
Google Cloud Spanner documentation recommends range partitioning for tables with strong locality requirements (parent child relationships via interleaved keys) and analytic workloads, but hash partitioning for write heavy tables with unpredictable access patterns. Spanner's split management automatically adjusts boundaries, but even with automation, hotspot prevention depends on thoughtful key design during schema creation.