Hot Keys, Hot Partitions, and Thundering Herd Mitigation
The Hot Key Problem
Skewed access patterns create hot keys that overload a single partition while others sit idle. Even with ample cluster capacity, a hot partition receiving 80% of traffic throttles requests. P99 latency (response time at 99th percentile, meaning 99% of requests are faster) can spike from 5ms to 500ms for hot key requests.
Hot Partition Causes
Viral content, celebrity users, popular products, or time-based keys (all todays events hitting one partition) create natural hot spots. Sequential keys like auto-increment IDs concentrate recent writes on one partition. The hash function distributes evenly across key space, but actual access patterns are rarely uniform.
Mitigation: Key Sharding
Spread hot keys across partitions by adding random suffixes. Instead of product:viral_item:count, use product:viral_item:shard_N:count with N ranging 0-99. Writes distribute across 100 partitions. Reads scatter-gather across all shards and aggregate. Trades simplicity for load distribution.
Mitigation: Caching
Application-level caching absorbs hot key traffic before reaching the database. Cache popular items in application memory or a dedicated cache layer. With 99% cache hit rate, the database sees 100x less traffic for hot keys.
Thundering Herd
When a cached hot keys TTL (Time-To-Live, the expiration duration) expires, hundreds of concurrent requests hit the database simultaneously. The database overloads, requests timeout, all retry, compounding the problem. Solutions: staggered expiration (add random jitter to TTL), probabilistic early refresh (refresh before expiration based on access rate), or request coalescing (one request refreshes while others wait on the result).