Database Design • Key-Value Stores (Redis, DynamoDB)Hard⏱️ ~3 min
Hot Keys, Hot Partitions, and Thundering Herd Mitigation
Skewed key distributions create hot keys that overload a single shard or partition, causing throttling and elevated tail latencies even when overall cluster capacity remains ample. In provisioned throughput systems like DynamoDB, per partition limits (typically 1000 write capacity units and 3000 read capacity units per partition) trigger rate limiting when a celebrity user or viral post concentrates traffic. A hot partition receiving 80% of requests can spike p99 latency from 10 milliseconds to 500 milliseconds while other partitions sit idle.
Mitigation starts with sharded keys using random suffixes. Instead of storing a popular counter at key equals "video:viral:views", split it into "video:viral:views:0" through "video:viral:views:9" and aggregate reads across shards. This distributes writes across ten partitions at the cost of ten read requests to reconstruct the value. Uber uses sharded counters for rate limiting to avoid hot key throttles, with per cell isolation preventing cascading failures. Facebook memcache employs lease tokens: on cache miss, one request gets a lease to fetch from origin while others wait, collapsing thousands of simultaneous misses into a single origin request.
Thundering herd occurs when popular keys expire simultaneously or systems cold start, causing origin overload. If 10,000 requests hit an expired cache key simultaneously and all miss, they overwhelm the database causing 30 second spikes. Mitigations include staggered time to live (TTL) values with jitter (random offset of plus or minus 10% of TTL), request coalescing (first requester fetches, others wait), negative caching (cache not found results for 60 seconds), and probabilistic early refresh (refresh cache before expiry with probability increasing as expiry nears).
DynamoDB adaptive capacity automatically redistributes throughput from cold to hot partitions within table level limits, reducing but not eliminating hot partition throttling. Systems must monitor top K hot keys, partition utilization skew, and throttle rates. Uber emphasizes per cell isolation where each geographic cell operates independently; hot keys in one cell do not cascade failures to other cells. Production runbooks include emergency key spreading and temporary over provisioning.
💡 Key Takeaways
•Hot partitions receiving 80% of traffic spike p99 latency from 10 milliseconds to 500 milliseconds due to per partition throughput limits (1000 writes per second, 3000 reads per second in DynamoDB)
•Sharded keys with random suffixes distribute load: split "video:viral:views" into "video:viral:views:0" through "video:viral:views:9" at cost of aggregating ten reads
•Thundering herd from simultaneous cache expirations sends 10,000 requests to origin causing 30 second database spikes; mitigate with lease tokens, staggered TTL jitter, request coalescing
•Facebook memcache lease tokens collapse cache misses: first requester gets lease to fetch origin while others wait, preventing miss storms at sub millisecond coordination cost
•Uber per cell isolation contains hot key failures to one geographic region; monitoring top K hot keys, partition skew, and throttle rates enables proactive spreading
📌 Examples
DynamoDB adaptive capacity redistributes throughput from cold to hot partitions but per partition hard limit of 1000 write capacity units still throttles celebrity user traffic spikes
Twitter Manhattan handles millions of requests per second with sharded counters and automatic partition rebalancing to prevent hot partition cascade failures across timeline service
Meta memcache uses lease tokens and request coalescing achieving over 90% hit rates; on cache miss only one request fetches from MySQL preventing database overload