Partitioning & ShardingHotspot Detection & HandlingEasy⏱️ ~2 min

What Are Hotspots and Why Do They Matter?

Hotspots are disproportionate concentrations of load on a small subset of resources in distributed systems. A hotspot can manifest as a hot key in a key value store receiving millions of reads per second, a hot partition in a distributed database consuming all available Input/Output Operations Per Second (IOPS), a celebrity user whose single action fans out to millions of followers, or a hot object being fetched simultaneously by thousands of clients. The critical characteristic of hotspots is that they cause local resource saturation (CPU, IOPS, network bandwidth, queue depth) even when the aggregate system has ample unused capacity across other partitions or nodes. The business impact of hotspots is severe. They cause tail latency to spike from milliseconds to seconds, trigger throttling errors that degrade user experience, and create cascading failures where one saturated component brings down dependent services. For example, Amazon DynamoDB partitions historically provided up to 3,000 Read Capacity Units (RCU) or 1,000 Write Capacity Units (WCU) each. If a single hot key monopolizes one partition, it will throttle at those limits even if your table has provisioned capacity for 100,000 RCU across many other idle partitions. Similarly, in Amazon Kinesis Data Streams, one shard supports up to 1 MB/s or 1,000 records/s ingress. A hot partition key sending all traffic to one shard will cause throttling while spare capacity sits unused in other shards. Detection requires surfacing skew at the right granularity: tracking per key, per partition, per shard, and per node metrics rather than just aggregate averages. Handling hotspots involves redistributing load through techniques like sharding, caching hot data closer to clients, rate limiting individual keys, or changing how data is partitioned. The most robust production systems combine proactive design to avoid hotspot prone access patterns, online detection of heavy hitters within seconds, dynamic mitigation strategies that activate automatically, and last resort load shedding with fair isolation to prevent a few hot keys from degrading service for everyone else.
💡 Key Takeaways
Hotspots cause local saturation (CPU, IOPS, network, queue depth) even when aggregate system capacity is ample, leading to tail latency spikes from milliseconds to seconds and throttling errors
Amazon DynamoDB partitions provide roughly 3,000 RCU or 1,000 WCU each; a hot key on one partition throttles at those limits regardless of unused capacity elsewhere in the table
Amazon Kinesis shards support 1 MB/s or 1,000 records/s ingress and 2 MB/s egress; hot partition keys concentrate load on single shards while other shards remain idle
Detection requires per key, per partition, and per node telemetry rather than aggregate metrics, since averages hide the skew that causes hotspots
Effective handling combines proactive design (well distributed partition keys), online detection (streaming heavy hitter algorithms), dynamic mitigation (caching, rebalancing, splitting), and load shedding with fair isolation
📌 Examples
Twitter's celebrity problem: accounts with millions of followers cause write storms during fan out on write, requiring hybrid strategies where celebrities use fan out on read to avoid concentrating writes
Google Cloud Bigtable with sequential timestamp keys sends most writes to the hottest tablet/region, saturating a single node at roughly 10,000 operations/s even when the cluster has spare capacity; solution is to hash or reverse timestamp components
Amazon S3 historically had request rate limits per key prefix (roughly 3,500 PUT/POST/DELETE and 5,500 GET per second per prefix), illustrating how clustered keys create hot partitions before automatic partitioning improvements
← Back to Hotspot Detection & Handling Overview