Loading...
Design Fundamentals • Scalability FundamentalsHard⏱️ ~3 min
Hotspots and Skew: When One Shard Takes All the Heat
The Celebrity Problem:
You've carefully sharded your user database across 10 servers using hash(user_id) mod 10. Each shard should handle 10% of traffic. Then a celebrity with 50 million followers tweets. Suddenly, 80% of your read traffic hits whichever shard stores that celebrity's account. That one shard's latency spikes from 10 millisecond (ms) to 500ms while the other nine shards sit mostly idle.
This is a hotspot, also called skew or imbalanced partitioning. It breaks the fundamental assumption of sharding: that load distributes evenly across partitions. When one partition receives disproportionate traffic, it becomes the bottleneck for the entire system regardless of total capacity.
Why Hash Partitioning Isn't Magic:
Hash based partitioning (hash of user ID determines shard) distributes keys evenly across shards in aggregate. If you have 100 million users across 10 shards, each shard stores approximately 10 million users. But access patterns aren't uniform. Some users generate 1,000× more traffic than others. A celebrity account, viral post, or popular product can concentrate requests on a single shard.
Range based partitioning (user IDs 0 to 1 million on shard 1, 1 to 2 million on shard 2) makes this worse. If new user signups cluster in the highest ID ranges, the last shard receives all write traffic for new accounts while early shards sit underutilized. Similarly, if your application accesses recent data more than old data (common in time series), the newest partition becomes a perpetual hotspot.
❗ Cascading Impact: A single hot shard degrades the entire application. When that shard's p99 latency jumps from 10ms to 500ms, every user request that touches data on that shard (even if it also queries other shards) waits for the slowest shard. The system's effective performance drops to the worst shard's performance.
Production Mitigation Strategies:
Application level caching targets hot keys specifically. Detect when a user account or item is receiving abnormal traffic (requests per second exceeding some threshold like 1,000 queries per second) and cache that specific entity in an in memory tier like Redis with a short Time To Live (TTL). The cache absorbs the spike while the shard serves normal traffic. This is what Twitter does for viral tweets and celebrity profiles.
Key salting splits a single hot key across multiple physical locations. Instead of storing user ID 12345 on one shard, store user:12345:replica_0, user:12345:replica_1, and user:12345:replica_2 on three different shards. Reads randomly pick a replica, spreading the load. Writes must update all replicas (adding complexity), but for read heavy celebrity accounts this tradeoff makes sense.
Adaptive Repartitioning:
Dynamic shard splitting detects hotspots and divides the hot partition. If shard 5 consistently receives 40% of traffic, split its key range in half and redistribute to two physical servers. This requires live data migration and careful orchestration to avoid downtime, making it a heavyweight operation reserved for persistent, predictable hotspots rather than temporary viral spikes.
Load aware routing sends requests to the least loaded shard when dealing with sharded caches or read replicas. Instead of deterministically routing user ID 12345 to shard 5, maintain real time load metrics (queries per second, p95 latency, CPU utilization) for each shard and route to the least loaded. This helps with unpredictable hotspots but requires centralized coordination and adds routing latency.
The Monitoring Imperative:
You cannot fix hotspots you cannot see. Instrument per shard metrics: requests per second, bytes per second, p50/p95/p99 latency, error rate, CPU and disk utilization. Set alerts when any single shard exceeds 2× the average load or when load distribution shows high variance (standard deviation greater than 50% of mean). Hotspots often emerge gradually as usage patterns shift, making continuous monitoring essential.💡 Key Takeaways
•Hotspot occurs when one shard receives disproportionate traffic (80% instead of expected 10%), causing latency spike from 10ms to 500ms on that shard while other shards remain underutilized
•Hash based partitioning distributes keys evenly but cannot prevent access skew; celebrity accounts or viral content can generate 1,000× more traffic than average users, concentrating load on a single shard
•Application level caching for hot keys: detect entities exceeding 1,000 queries per second (QPS) threshold and cache in Redis with short Time To Live (TTL), absorbing traffic spikes before they hit the database shard
•Key salting replicates hot entities across multiple shards (user:12345:replica_0, user:12345:replica_1, user:12345:replica_2) and randomly picks replica for reads, spreading load at cost of write complexity updating all replicas
•Monitor per shard metrics and alert when any shard exceeds 2× average load or distribution variance exceeds 50% of mean; hotspots often emerge gradually as usage patterns shift over weeks or months
📌 Examples
Twitter viral tweet from celebrity with 50 million followers: without caching, 80% of timeline read traffic concentrates on one shard storing that user's data, causing p99 latency degradation from 100ms to over 1 second
E-commerce flash sale on single product: item ID hashes to one shard which receives 10,000 queries per second while other shards handle 500 QPS; solution is aggressive local caching with 5 second TTL for hot product data
Time series database with range partitioning by timestamp: newest partition receives all writes (100% write traffic) while old partitions sit idle, requiring periodic rebalancing or compound partition keys mixing time and entity ID
Loading...