CachingCache Stampede ProblemMedium⏱️ ~3 min

Stale While Revalidate (SWR) and Soft TTL vs Hard TTL

Stale While Revalidate (SWR) and its companion Stale If Error (SIE) are production proven patterns deployed at internet scale across Content Delivery Network (CDN) edge caches and application caches. The core idea is dual expiration: each cache entry has a soft TTL (after which data is stale but serveable) and a hard TTL (absolute maximum serve time). After soft TTL expires, the cache immediately returns stale content to the requesting client with cache hit latency (typically single digit milliseconds) while asynchronously triggering a background refresh. This decouples user facing latency from origin fetch latency. At CDN scale, a single edge location handles hundreds of thousands of RPS; SWR allows that edge to serve stale content at 5ms P50 latency during refresh instead of blocking on 200 to 500ms origin fetches. The impact on availability and tail latency is transformative. Consider a hot key serving 50,000 RPS with 5 minute soft TTL and 10 minute hard TTL. At T equals 300 seconds, soft TTL expires. Without SWR, all 50,000 RPS experience cache miss and wait for origin (P99 latency spikes from 5ms to 400ms). With SWR, all 50,000 RPS immediately receive stale data at 5ms while exactly 1 background refresh hits origin. User visible P99 latency remains 5 to 10ms. Origin load drops from 50,000 QPS burst to 1 QPS steady trickle. SIE extends this: if background refresh fails (origin timeout, 5xx error), continue serving stale data until hard TTL, preventing error propagation to users. Tuning requires balancing freshness against resilience. Soft TTL controls staleness window (how long users might see old data), typically 30 seconds to 5 minutes depending on domain. Hard TTL provides error absorption: setting it to 2 to 3 times soft TTL (e.g., 10 minute hard for 5 minute soft) allows the system to survive brief origin outages by serving stale. During a 3 minute database incident, SIE keeps serving cached data instead of propagating 500 errors to users. The tradeoff is potential staleness: account balance or inventory caches cannot use SWR due to consistency requirements, but user profiles, recommendations, and content feeds tolerate minutes of staleness for dramatic latency and availability improvements.
💡 Key Takeaways
CDN edge caches serve hundreds of thousands of RPS per location; SWR keeps P50 latency at 5ms during refresh instead of 200 to 500ms origin fetch, reducing tail latency by 40 to 100 times
Hot key at 50,000 RPS: SWR serves all requests stale at 5ms P99 while 1 background refresh hits origin, dropping origin load from 50,000 QPS burst to 1 QPS
Dual expiration tuning: soft TTL controls staleness window (30s to 5min typical), hard TTL set 2 to 3 times soft TTL (e.g., 10min hard for 5min soft) provides error absorption during origin incidents
Stale If Error (SIE) continues serving stale during origin failures until hard TTL; during 3 minute database outage, prevents propagating 500 errors to users by serving cached data
Freshness tradeoff: account balances and inventory require strict consistency so cannot use SWR; user profiles, feeds, recommendations tolerate minutes of staleness for dramatic availability gains
Write amplification consideration: background refresh adds origin load (1 QPS per hot key); for 1000 hot keys refreshing every 5 minutes, adds 1000/300 = 3.3 QPS baseline refresh load to origin
📌 Examples
Netflix CDN: Popular video metadata (title, thumbnail, description) served with 2 minute soft TTL and 10 minute hard TTL. During 5 minute origin service degradation, edge continues serving stale metadata to millions of users with zero user visible errors.
Social media timeline: User home feed at 20,000 RPS uses 1 minute soft TTL and 5 minute hard TTL. Post published at T=0 appears in cache refresh at T=60s. Users see feed with up to 1 minute staleness but P99 latency stays 8ms instead of spiking to 300ms on cache miss.
HTTP Cache-Control header: 'Cache-Control: max-age=300, stale-while-revalidate=300, stale-if-error=600' sets 5 minute soft TTL, allows 5 additional minutes of stale serving during refresh, and 10 additional minutes during errors.
← Back to Cache Stampede Problem Overview