Loading...
Design Fundamentals • Scalability FundamentalsHard⏱️ ~3 min
Cache Stampede and Thundering Herd: When Everyone Asks at Once
The Cascading Failure Scenario:
Your homepage query result is cached with a 60 second Time To Live (TTL). At 10,000 requests per second, that cache serves 600,000 requests before expiring. When the TTL expires at exactly 12:00:00, the cache key disappears. The next 10,000 requests in that one second all see a cache miss simultaneously. All 10,000 requests hit your database at once.
Your database normally handles 500 queries per second comfortably at 20 millisecond (ms) p95 latency. Suddenly it receives 10,000 queries in one second, a 20× spike. Query queues explode. Latency jumps to 5 seconds. Connection pools saturate. The database might crash entirely. This is cache stampede, also called thundering herd or dog piling.
Why Simple Solutions Fail:
Setting a longer TTL just delays the problem and increases staleness. A 10 minute TTL means the stampede hits every 10 minutes with 6 million requests worth of traffic arriving simultaneously. Making the TTL shorter spreads stampedes out but increases cache miss rate and steady state database load.
Adding randomness to TTL helps but doesn't eliminate the risk. If TTLs are 60 seconds plus or minus 10 seconds, you spread the expirations across a 20 second window. Instead of 10,000 simultaneous misses, you get 500 misses per second for 20 seconds. That's still 500 queries per second, right at your database capacity, with no headroom for other traffic.
⚠️ Microservice Amplification: In distributed systems, stampedes amplify across service boundaries. When your API cache expires, 100 API servers all call the backend service simultaneously. If that backend then has its own cache miss, those 100 requests each trigger 10 database queries, resulting in 1,000 queries hitting the database from a single cache expiration.
Production Grade Mitigation:
Request coalescing (also called request deduplication or single flight) is the fundamental solution. When the first request sees a cache miss, it sets a flag indicating "refresh in progress" and begins fetching from the database. The next 9,999 requests arriving in that same second see the flag and wait for the first request to complete, then all share the result. Only one database query executes regardless of concurrent request count.
Implement this with a distributed lock or semaphore. In Redis, use SET with NX (not exists) and EX (expiration) flags. The first request that successfully sets the lock wins and becomes responsible for the refresh. Other requests check for the lock, find it exists, and either wait with a timeout or serve slightly stale data if available.
Probabilistic early expiration is a complementary technique. Instead of hard expiring at 60 seconds, start randomly refreshing when the key is 50 to 60 seconds old based on traffic rate. The formula: if (time_remaining < TTL × random(0,1) × log(requests_per_second)), refresh now. High traffic keys refresh slightly early, spreading the load and preventing synchronized expiration.
The Stale While Revalidate Pattern:
Serve stale data while asynchronously refreshing in the background. When a cache key expires, immediately return the stale value to the user (keeping response time at 5ms) and trigger a background job to fetch fresh data. The next request gets the updated value. This strategy sacrifices strict freshness for consistent latency and database protection.
Meta's cache architecture uses soft TTLs with background refresh. When a cache entry approaches expiration, a background process refreshes it before users ever see a miss. Cache hit ratios stay above 95%, database load remains steady, and stampedes are architecturally impossible because cache keys never truly expire under traffic.💡 Key Takeaways
•Cache stampede occurs when a popular cache key expires and thousands of concurrent requests all miss simultaneously, overwhelming the backend; at 10,000 requests per second, expiration causes 10,000 queries in one second versus normal 500 queries per second
•Request coalescing (single flight pattern) ensures only one request fetches from database on cache miss while others wait and share the result, reducing 10,000 simultaneous queries to exactly 1 query
•Implement coalescing with distributed locks in Redis using SET with NX (not exists) and EX (expiration) flags; first request acquires lock and refreshes, others see lock and wait or serve stale data
•Probabilistic early expiration formula: if time_remaining < TTL × random × log(requests_per_second), refresh now; this spreads high traffic key refreshes across time instead of synchronized expiration
•Stale while revalidate pattern serves expired cached data immediately (5ms response) while triggering asynchronous background refresh, sacrificing strict freshness for consistent latency and database protection
📌 Examples
Homepage query cached with 60 second TTL at 10,000 requests per second: cache expiration triggers 10,000 simultaneous database queries causing latency spike from 20ms to 5 seconds and potential connection pool exhaustion
Meta's cache tier uses soft TTLs with background refresh processes that revalidate entries before expiration, maintaining 95%+ hit ratios and preventing stampedes by ensuring cache keys never expire under active traffic
E-commerce product page at 5,000 requests per second: without coalescing, cache miss causes 5,000 inventory queries in one second; with coalescing, only 1 inventory query executes and result shared across all 5,000 requests
Loading...