Detecting Hotspots with Per Key Telemetry and Heavy Hitter Algorithms

Aggregate metrics like average CPU utilization or total Queries Per Second (QPS) hide the skew that defines hotspots. You need granular telemetry at multiple layers: per key counters and timers, per partition latency distributions (p95, p99), per shard throttle rates and queue depths, and per node CPU and cache hit rates. Without this visibility, autoscaling and protection mechanisms trigger too late, after user experience has already degraded. The challenge is doing this efficiently: tracking millions of unique keys individually is prohibitively expensive in memory and processing overhead.

Streaming heavy hitter detection algorithms solve this by maintaining approximate top K hot keys in bounded memory. Algorithms like Count Min Sketch with conservative updates or SpaceSaving can track the hottest keys across billions of events using only tens of megabytes of memory with less than 1% error. These data structures update in constant time per event and can be queried in milliseconds to return the current top 100 or top 1000 keys. Production systems update these sketches every second and use the results to drive mitigation actions like adding cache replicas for hot read keys or applying per key rate limits for hot write keys.

Visualization and alerting strategies must surface the distribution of load, not just totals. Heatmaps showing tail latency by shard or partition reveal which specific resources are saturated. Service Level Objective (SLO) budgets should track metrics like percentage of capacity consumed by the top 1% of keys or any partition exceeding 80% sustained utilization for more than Y seconds (commonly 60 to 120 seconds to filter transient spikes). These alerts trigger before complete saturation, giving time for automated or manual mitigation. For example, if heavy hitter detection shows one user ID consuming 40% of a partition's write capacity, the system can proactively split that partition or apply a per key rate limit before throttling affects other users.

💡 Key Takeaways

✓Aggregate metrics hide hotspots; you need per key counters, per partition p95/p99 latencies, per shard throttle rates, queue depths, and per node CPU and cache hit rates to surface skew

✓Count Min Sketch and SpaceSaving algorithms maintain top K hot keys across billions of events using only tens of MB memory with less than 1% error, updating in constant time per event

✓Production systems update heavy hitter sketches every 1 second and use results to drive automatic mitigations like adding cache replicas or applying per key rate limits within seconds

✓Heatmaps and SLO alerts should track metrics like more than X% of capacity consumed by top 1% of keys or any partition exceeding 80% sustained utilization for 60 to 120 seconds

✓Without per entity telemetry, autoscaling and circuit breakers trigger too late after user experience has degraded; early detection enables proactive splitting or rate limiting before saturation

📌 Interview Tips

1A streaming analytics service processes 10 billion events per day, using a 50 MB Count Min Sketch to track the top 1000 hot user IDs with updates every second, triggering cache warming for IDs exceeding 10,000 requests/s

2DynamoDB's adaptive capacity monitors per partition throttle rates and p99 latencies; when a partition sustains more than 80% of its 3,000 RCU limit for 60 seconds, it automatically shifts throughput capacity from idle partitions

3A social media API tracks per endpoint and per user request rates; when one user consumes 40% of a partition's write capacity, heavy hitter detection identifies them within 2 seconds and applies a per key rate limit of 500 RPS

← Back to Hotspot Detection & Handling Overview