What Are Hotspots and Why Do They Matter?

Definition
Hotspots are disproportionate concentrations of load on a small subset of resources. A single partition receiving 80% of traffic while others sit idle. A celebrity's post triggering millions of fan-out writes. The critical issue: one resource saturates while aggregate system capacity remains underutilized.
Why Hotspots Break Systems:

Every partition has physical limits: disk can sustain maybe 500 MB/s, a single CPU core handles perhaps 50,000 operations per second, network links cap at 10 Gbps. When one partition hits these limits, it starts rejecting requests or queuing them, causing latency to spike from 5ms to 500ms or worse.

The dangerous part: your monitoring shows 20% average CPU and plenty of spare capacity. But that average hides one partition at 100% while nine others idle at 10%.

Common Hotspot Patterns:Hot keys: A viral tweet, a popular product, a celebrity user. One key receives 10,000x more traffic than average keys.

Time-based skew: All writes go to the "current" time bucket. Partition holding "today" gets all the load.

Monotonic keys: Auto-increment IDs or timestamps route all recent writes to the highest-range partition.

✓ In Practice: Track per-partition metrics (CPU, IOPS, queue depth), not just aggregates. A system with 10% average utilization can still have hotspots if one partition is at 100%.

💡 Key Takeaways

✓Hotspots cause local saturation (CPU, IOPS, network, queue depth) even when aggregate system capacity is ample, leading to tail latency spikes from milliseconds to seconds and throttling errors

✓Amazon DynamoDB partitions provide roughly 3,000 RCU or 1,000 WCU each; a hot key on one partition throttles at those limits regardless of unused capacity elsewhere in the table

✓Amazon Kinesis shards support 1 MB/s or 1,000 records/s ingress and 2 MB/s egress; hot partition keys concentrate load on single shards while other shards remain idle

✓Detection requires per key, per partition, and per node telemetry rather than aggregate metrics, since averages hide the skew that causes hotspots

✓Effective handling combines proactive design (well distributed partition keys), online detection (streaming heavy hitter algorithms), dynamic mitigation (caching, rebalancing, splitting), and load shedding with fair isolation

📌 Interview Tips

1Twitter's celebrity problem: accounts with millions of followers cause write storms during fan out on write, requiring hybrid strategies where celebrities use fan out on read to avoid concentrating writes

2Google Cloud Bigtable with sequential timestamp keys sends most writes to the hottest tablet/region, saturating a single node at roughly 10,000 operations/s even when the cluster has spare capacity; solution is to hash or reverse timestamp components

3Amazon S3 historically had request rate limits per key prefix (roughly 3,500 PUT/POST/DELETE and 5,500 GET per second per prefix), illustrating how clustered keys create hot partitions before automatic partitioning improvements

← Back to Hotspot Detection & Handling Overview