Cache Stampede and Hot Key Overload: Failure Modes at Scale

Cache Stampede (Thundering Herd)
Occurs when a popular key expires and thousands of concurrent requests simultaneously discover the miss and attempt to refetch from backend. At scale, this spikes origin QPS by 1000x+ in milliseconds. Example: key at 50,000 RPS expires; cache lookup takes 1ms, database fetch takes 20ms. During that 20ms refill window, 1,000 requests miss and independently query database: 1000x amplification.
Stampede Mitigations
Request coalescing collapses concurrent requests for same key into one backend fetch. Lease based refill grants one requester exclusive refresh rights via token while others wait or receive stale data. Soft TTL with background refresh serves slightly stale data while worker refreshes before hard TTL, so users never wait for slow backend. Jittered TTLs randomize expiration by ±10-20% to prevent synchronized expirations.
Hot Key Overload
Small set of keys receives disproportionate traffic, saturating one cache partition. With consistent hashing, each key maps to one primary. If one key receives 100,000 RPS while average is 10 RPS, that node handles 10,000x more traffic, hitting CPU/network limits. Common in social networks (celebrity posts), e-commerce (flash sales), content platforms (viral videos).
Hot Key Mitigations
Key replication: copy hot key to multiple nodes, fan out reads across replicas. Rate limiting per key to cap QPS and shed excess. Key splitting: break one logical key into multiple physical keys with client side aggregation (where semantics allow). Track per key request rates, per node load distribution, cache miss storms, and backend query amplification.

💡 Key Takeaways

✓Stampede amplification: (backend latency / cache latency) × request rate. 50K RPS key with 20ms DB and 1ms cache = 1000 concurrent DB queries.

✓Lease based refill: one requester gets exclusive refresh token, others get stale data or wait. Prevents thundering herd.

✓Soft TTL serves stale data while background worker refreshes before hard TTL. Users never wait for slow backend.

✓Hot key overload: one key at 100K RPS saturates single node. Mitigate with replication, rate limiting, or key splitting.

📌 Interview Tips

1Stampede math: 50K RPS, 1ms cache, 20ms DB refill = 1000 requests in that 20ms window, each hitting DB. 1000x amplification.

2Jittered TTL: add ±10-20% randomization to prevent thousands of keys expiring at same instant.

3Hot key replication: copy celebrity profile to 10 nodes, clients randomly pick one, spreading 100K RPS across 10 nodes = 10K each.

← Back to Distributed Caching Overview