Cache Stampede and Hot Key Overload: Failure Modes at Scale
Cache Stampede (Thundering Herd)
Occurs when a popular key expires and thousands of concurrent requests simultaneously discover the miss and attempt to refetch from backend. At scale, this spikes origin QPS by 1000x+ in milliseconds. Example: key at 50,000 RPS expires; cache lookup takes 1ms, database fetch takes 20ms. During that 20ms refill window, 1,000 requests miss and independently query database: 1000x amplification.
Stampede Mitigations
Request coalescing collapses concurrent requests for same key into one backend fetch. Lease based refill grants one requester exclusive refresh rights via token while others wait or receive stale data. Soft TTL with background refresh serves slightly stale data while worker refreshes before hard TTL, so users never wait for slow backend. Jittered TTLs randomize expiration by ±10-20% to prevent synchronized expirations.
Hot Key Overload
Small set of keys receives disproportionate traffic, saturating one cache partition. With consistent hashing, each key maps to one primary. If one key receives 100,000 RPS while average is 10 RPS, that node handles 10,000x more traffic, hitting CPU/network limits. Common in social networks (celebrity posts), e-commerce (flash sales), content platforms (viral videos).
Hot Key Mitigations
Key replication: copy hot key to multiple nodes, fan out reads across replicas. Rate limiting per key to cap QPS and shed excess. Key splitting: break one logical key into multiple physical keys with client side aggregation (where semantics allow). Track per key request rates, per node load distribution, cache miss storms, and backend query amplification.