CachingCDN CachingHard⏱️ ~3 min

CDN Cache Stampede and Thundering Herd Mitigation

A cache stampede (also called thundering herd) occurs when a popular cached object expires or is purged, and thousands of concurrent requests simultaneously discover it missing. Without coordination, each request generates an origin fetch, overwhelming the origin with a spike of identical requests. For a globally distributed CDN with 200 edge PoPs, a single hot object expiring can trigger 200 simultaneous origin connections within milliseconds. If each request takes 200 milliseconds round-trip time (RTT) and the object is 5 MB, the origin suddenly faces 1 GB of egress and hundreds of database queries in a 200 millisecond window. Origins designed for steady-state load of 1000 requests per second can experience 10x to 100x spikes, causing CPU saturation, memory exhaustion, and cascading failures. Request coalescing (also called request collapsing or deduplication) is the primary mitigation. When a CDN cache miss occurs, the system checks if an identical fetch is already in-flight for the same cache key. If so, the new request waits for the existing fetch to complete rather than initiating a duplicate. This collapses N concurrent misses into a single origin request, with all N requesters receiving the response once it arrives. Amazon CloudFront and other modern CDNs implement coalescing at the regional shield tier, ensuring only one shield location fetches per key even if dozens of edges miss simultaneously. Additional strategies include jittered Time To Live (TTL) values, where identical objects across the CDN get slightly randomized expiration times (for example, 3600 seconds plus or minus 60 seconds) to spread expiration load over time rather than synchronized bursts. Stale serving extensions provide additional defense. The Cache-Control directive stale-while-revalidate allows a CDN to serve an expired object immediately while asynchronously refreshing it in the background. For example, a 60 second TTL object with stale-while-revalidate equals 300 means the CDN serves the stale version for up to 5 minutes after expiration while fetching a fresh copy. Users get instant responses (no miss latency), and the origin sees smoothed revalidation traffic instead of spikes. The stale-if-error directive goes further: if the origin is unreachable or returns errors during revalidation, the CDN continues serving stale content rather than failing requests. Netflix and other high-availability systems combine stale extensions with prewarming (fetching hot objects to shield before expiry) and circuit breakers (stop fetching on sustained origin errors) to maintain service during origin incidents.
💡 Key Takeaways
Cache stampede occurs when a hot object expires across many PoPs simultaneously, generating 10x to 100x origin request spikes (for example, 200 PoPs triggering 200 concurrent fetches in under 200 milliseconds)
Request coalescing collapses concurrent misses for the same cache key into a single origin fetch; modern CDNs implement this at regional shield tiers, reducing origin requests by 90% to 95% during stampede scenarios
Jittered TTL adds randomness (for example, plus or minus 5% to 10%) to expiration times, spreading synchronized expiration load over minutes rather than all at once, reducing peak origin request rate by up to 80%
The stale-while-revalidate directive serves expired content immediately (zero miss latency) while asynchronously refreshing in background, eliminating user-facing latency spikes during revalidation
The stale-if-error directive continues serving stale cached content when origin returns errors or is unreachable, maintaining availability during origin outages at the cost of temporary staleness
Prewarming hot objects by fetching them to shield before expiry (triggered by monitoring or scheduled jobs) prevents cold misses during peak traffic events, used by Netflix for major release launches
📌 Examples
A news site homepage with 3600 second TTL expires at 09:00 UTC across 150 edge PoPs; without coalescing, 150 origin requests occur within 100 milliseconds; with shield-tier coalescing, only 10 shield locations fetch, reducing origin load by 93%
An e-commerce site enables stale-while-revalidate equals 300 on product API responses with 60 second TTL; during a database slowdown (revalidation taking 5 seconds instead of 200 milliseconds), users still see 50 millisecond edge responses using slightly stale data
Netflix prewarms the shield tier with segments of a new season 1 hour before release by programmatically requesting all segments from a background job; when users flood in at launch, shields already have the content, preventing origin saturation
← Back to CDN Caching Overview
CDN Cache Stampede and Thundering Herd Mitigation | CDN Caching - System Overflow