CachingCache Stampede ProblemHard⏱️ ~3 min

Production Implementation: Combining Leases, SWR, and Observability

Production grade cache stampede prevention requires layering multiple techniques with comprehensive observability. A robust implementation combines per key request collapsing via leases, soft Time To Live (TTL) and hard TTL with Stale While Revalidate (SWR) and Stale If Error (SIE), probabilistic early refresh with TTL jitter, backpressure mechanisms, and hot key detection with specialized handling. The interaction between these components is critical: leases prevent duplicate refreshes, SWR maintains low latency during refresh, early refresh and jitter spread load over time, and backpressure protects the origin when things go wrong. Meta combines lease based request collapsing in Memcache with application level SWR logic to handle millions of cache operations per second across thousands of hot keys. Configuration requires careful tuning based on workload characteristics. For hot keys (above 10,000 Requests Per Second (RPS)), use short soft TTL (30 to 60 seconds) to keep data relatively fresh, long hard TTL (5 to 10 minutes) for error absorption, and 500 to 800ms lock TTL (2 to 3 times P99 refresh latency). For medium temperature keys (100 to 1,000 RPS), use longer soft TTL (2 to 5 minutes) to reduce refresh frequency and write amplification. For cold tail keys (under 10 RPS), use longest TTL (10+ minutes) since refresh cost is amortized over few requests. Apply 10 to 20 percent TTL jitter universally and enable probabilistic early refresh with beta equals 1.5. Set per key concurrency limit to 1 (only one refresh in flight per key) and global refresh concurrency cap based on origin capacity (e.g., if database handles 1,000 Queries Per Second (QPS), cap total cache refresh concurrency to 500 to leave headroom for direct queries). Observability is non negotiable. Track per key metrics including hit rate, soft hit rate (served stale via SWR), miss rate, refresh latency (P50, P95, P99), lock contention rate (percentage of lease acquisition failures), refresh error rate, and staleness duration (time served stale before successful refresh). Aggregate metrics across key temperature tiers: hot, warm, cold. Alert on anomalies such as sudden drop in hit rate (indicates mass invalidation or TTL misconfiguration), spike in lock timeouts (lease holders crashing or hanging), elevated refresh latency (origin degradation), or prolonged staleness (refresh failures). Implement distributed tracing to correlate user requests with cache operations and origin calls. During incidents, detailed metrics enable rapid diagnosis: if lock timeout rate spikes, investigate lease holder health; if refresh error rate climbs, check origin service; if staleness duration exceeds hard TTL minus soft TTL, validate SIE fallback logic.
💡 Key Takeaways
Meta Memcache combines lease based collapsing with application SWR to serve millions of cache ops per second; only 1 refresh per hot key while followers receive stale at cache hit latency
Hot key config (above 10k RPS): 30 to 60s soft TTL, 5 to 10min hard TTL, 500 to 800ms lock TTL (2 to 3x P99 refresh), beta equals 1.5 early refresh, 15 percent jitter, per key concurrency equals 1
Global refresh concurrency cap essential: if origin handles 1,000 QPS, cap cache refresh to 500 concurrent to leave headroom for direct traffic and prevent origin saturation
Per key observability tracks hit rate, soft hit rate (SWR serving stale), refresh latency P99, lock contention rate, and staleness duration; alert when staleness exceeds (hard TTL minus soft TTL)
Tiered configuration by key temperature: hot keys (10k+ RPS) get short soft TTL for freshness; cold tail (under 10 RPS) gets long TTL (10min+) to minimize write amplification
Failure correlation via tracing: spike in lock timeout rate plus elevated refresh latency indicates lease holders waiting on slow origin; spike in refresh error rate with normal latency indicates origin returning 5xx errors
📌 Examples
Social media platform config: User profile cache (hot key, 50k RPS) uses 45s soft TTL, 8min hard TTL, 600ms lock TTL. On soft expiry, lease holder refreshes in 180ms P99; 49,999 followers served stale at 6ms P99. Origin load for this key: 1 refresh per 45s equals 0.022 QPS instead of 50,000 QPS.
Observability dashboard shows: 'user:profile:12345' hit_rate 99.2%, soft_hit_rate 0.6% (served stale), miss_rate 0.2%, refresh_latency_p99 180ms, lock_contention_rate 0.05%, staleness_duration_p99 220ms. Metrics confirm SWR working: 0.6% of requests served stale during 180ms refresh, lock contention near zero.
E-commerce product catalog: 5,000 products, temperature varies 10 to 30,000 RPS. Hot products (top 100, above 5k RPS) configured 60s soft TTL; warm products (next 900, 100 to 5k RPS) configured 4min soft TTL; cold tail (4,000 products, under 100 RPS) configured 15min soft TTL. Total refresh load: 100×(1/60) + 900×(1/240) + 4000×(1/900) equals 1.67 + 3.75 + 4.44 equals 9.86 QPS, well within origin capacity.
← Back to Cache Stampede Problem Overview
Production Implementation: Combining Leases, SWR, and Observability | Cache Stampede Problem - System Overflow