Production Implementation: Combining Leases, SWR, and Observability

Layered Defense Architecture
Production grade cache stampede prevention requires layering multiple techniques with comprehensive observability. A robust implementation combines per key request collapsing via leases, soft Time To Live (TTL) and hard TTL with Stale While Revalidate (SWR) and Stale If Error (SIE), probabilistic early refresh with TTL jitter, backpressure mechanisms, and hot key detection. The interaction between these components is critical: leases prevent duplicate refreshes, SWR maintains low latency during refresh, early refresh and jitter spread load over time, and backpressure protects the origin when things go wrong. Large scale caching systems combine lease based request collapsing with application level SWR logic to handle millions of cache operations per second across thousands of hot keys.
Configuration By Key Temperature
Configuration requires careful tuning based on workload characteristics. For hot keys (above 10,000 Requests Per Second), use short soft TTL (30 to 60 seconds) to keep data relatively fresh, long hard TTL (5 to 10 minutes) for error absorption, and 500 to 800ms lock TTL (2 to 3 times P99 refresh latency). For medium temperature keys (100 to 1,000 RPS), use longer soft TTL (2 to 5 minutes) to reduce refresh frequency and write amplification. For cold tail keys (under 10 RPS), use longest TTL (10+ minutes) since refresh cost is amortized over few requests. Apply 10 to 20 percent TTL jitter universally and enable probabilistic early refresh with beta equals 1.5. Set per key concurrency limit to 1 and global refresh concurrency cap based on origin capacity (e.g., if database handles 1,000 QPS, cap total cache refresh concurrency to 500 to leave headroom for direct queries).
Essential Observability
Observability is non negotiable. Track per key metrics including hit rate, soft hit rate (served stale via SWR), miss rate, refresh latency (P50, P95, P99), lock contention rate (percentage of lease acquisition failures), refresh error rate, and staleness duration (time served stale before successful refresh). Aggregate metrics across key temperature tiers: hot, warm, cold. Alert on anomalies such as sudden drop in hit rate (indicates mass invalidation or TTL misconfiguration), spike in lock timeouts (lease holders crashing or hanging), elevated refresh latency (origin degradation), or prolonged staleness (refresh failures). Implement distributed tracing to correlate user requests with cache operations and origin calls.
Incident Diagnosis
During incidents, detailed metrics enable rapid diagnosis: if lock timeout rate spikes, investigate lease holder health; if refresh error rate climbs, check origin service; if staleness duration exceeds hard TTL minus soft TTL, validate SIE fallback logic. Correlation between metrics reveals root cause: lock timeout spike plus elevated refresh latency indicates lease holders waiting on slow origin; refresh error rate spike with normal latency indicates origin returning 5xx errors.

💡 Key Takeaways

✓Layered defense combines leases (prevent duplicate refresh), SWR (maintain latency), jitter plus early refresh (spread load), and backpressure (protect origin) working together

✓Hot key config (above 10k RPS): 30 to 60s soft TTL, 5 to 10min hard TTL, 500 to 800ms lock TTL (2 to 3x P99 refresh), beta equals 1.5 early refresh, 15 percent jitter, per key concurrency equals 1

✓Global refresh concurrency cap essential: if origin handles 1,000 QPS, cap cache refresh to 500 concurrent to leave headroom for direct traffic and prevent origin saturation

✓Per key observability tracks hit rate, soft hit rate (SWR serving stale), refresh latency P99, lock contention rate, and staleness duration; alert when staleness exceeds (hard TTL minus soft TTL)

✓Failure correlation: lock timeout spike plus elevated refresh latency indicates slow origin; refresh error spike with normal latency indicates origin 5xx errors

📌 Interview Tips

1Social media platform config: User profile cache (hot key, 50k RPS) uses 45s soft TTL, 8min hard TTL, 600ms lock TTL. On soft expiry, lease holder refreshes in 180ms P99; 49,999 followers served stale at 6ms P99. Origin load for this key: 1 refresh per 45s equals 0.022 QPS instead of 50,000 QPS.

2Observability dashboard shows: user_profile_12345 hit_rate 99.2%, soft_hit_rate 0.6% (served stale), miss_rate 0.2%, refresh_latency_p99 180ms, lock_contention_rate 0.05%, staleness_duration_p99 220ms. Metrics confirm SWR working: 0.6% requests served stale during 180ms refresh.

3E-commerce product catalog: 5,000 products with temperature from 10 to 30,000 RPS. Hot products (top 100) get 60s soft TTL; warm products (next 900) get 4min soft TTL; cold tail (4,000 products) gets 15min soft TTL. Total refresh load approximately 10 QPS across all products, well within origin capacity.

← Back to Cache Stampede Problem Overview