What is Cache Stampede and Why Does It Happen?
The Amplification Problem
When a key at 10,000 RPS expires with 50ms backend latency, approximately 500 requests will miss during that 50ms refill window and each independently query the database. Origin designed for 100 QPS suddenly receives 500 QPS, causing 5x amplification that can spike latency and trigger cascading failures.
Why It Happens
Cache entries have finite TTL. All readers check TTL on access. When TTL expires, the next request must fetch from origin. In high concurrency systems, many requests arrive within the fetch latency window. Without coordination, each request independently fetches, creating N origin requests where only 1 is needed. The problem compounds with multiple hot keys expiring simultaneously (synchronized TTL) or when a popular key is explicitly invalidated.
Impact at Scale
At large scale, stampedes cause cascading failures. Database connection pools exhaust, queries queue, latency spikes to seconds, timeouts propagate, error rates climb, and users see degraded experience. A 2% drop in hit rate at millions of QPS means hundreds of thousands of extra origin requests per second during stampede windows.