Sharding and Shard Key Selection at Scale

Why Sharding Matters
A single server has finite capacity. When data or traffic exceeds what one machine can handle, sharding distributes documents across multiple servers (shards). The shard key determines which shard stores each document. A good shard key distributes data evenly and routes queries efficiently. A bad shard key creates hotspots that bottleneck the entire system regardless of cluster size.
High Cardinality Requirement
The shard key must have high cardinality (many distinct values). A key with only 3 values (like status: pending, active, closed) limits you to 3 effective shards. Adding more shards wastes capacity since documents can only land in those 3 buckets.
Monotonic keys like timestamps or auto-increment IDs create write hotspots. All new documents have the latest timestamp, so they all route to one shard (the one handling the newest range). Other shards sit idle while one shard receives 100% of writes. P99 latency (response time at 99th percentile) spikes from 10ms to 500ms during traffic bursts.
Hashed vs Range Keys
Hashing the shard key distributes writes evenly: each shard receives approximately 1/N of traffic. The tradeoff is losing range query efficiency. Querying "all orders in the last hour" with a hashed timestamp key requires scatter-gather across all shards since those orders are distributed randomly.
Composite keys like (tenantId, hashedOrderId) balance both needs. Queries scoped to one tenant route to that tenants shards (efficient). Within a tenant, the hash component spreads writes evenly (no hotspot).
Resharding Challenges
Changing the shard key after deployment is expensive. Live resharding requires migrating data chunks between shards while serving traffic, consuming I/O and CPU. Plan the shard key for 3-5 years of growth. Consider future query patterns, not just current ones.

💡 Key Takeaways

✓Shard key cardinality limits effective shards: 3 distinct values means at most 3 useful shards regardless of cluster size

✓Monotonic keys (timestamps, auto-increment) concentrate 100% of writes on one shard, causing P99 spikes from 10ms to 500ms

✓Hashed keys distribute writes evenly but lose range query efficiency, requiring scatter-gather across all shards

✓Composite keys like (tenantId, hashedId) enable efficient tenant-scoped queries while distributing writes via hash

✓Resharding is expensive: plan shard key for 3-5 years of growth to avoid costly migrations

📌 Interview Tips

1Ask about query patterns before proposing a shard key: range queries need range keys, even distribution needs hash

2Identify hotspot risks: popular users, viral content, or time-based access patterns can overwhelm one shard

3Propose composite keys for multi-tenant systems: tenant prefix for routing, hash suffix for distribution

← Back to Document Databases (MongoDB, Firestore) Overview