Production Implementation: Combining Leases, SWR, and Observability
Layered Defense Architecture
Production grade cache stampede prevention requires layering multiple techniques with comprehensive observability. A robust implementation combines per key request collapsing via leases, soft Time To Live (TTL) and hard TTL with Stale While Revalidate (SWR) and Stale If Error (SIE), probabilistic early refresh with TTL jitter, backpressure mechanisms, and hot key detection. The interaction between these components is critical: leases prevent duplicate refreshes, SWR maintains low latency during refresh, early refresh and jitter spread load over time, and backpressure protects the origin when things go wrong. Large scale caching systems combine lease based request collapsing with application level SWR logic to handle millions of cache operations per second across thousands of hot keys.
Configuration By Key Temperature
Configuration requires careful tuning based on workload characteristics. For hot keys (above 10,000 Requests Per Second), use short soft TTL (30 to 60 seconds) to keep data relatively fresh, long hard TTL (5 to 10 minutes) for error absorption, and 500 to 800ms lock TTL (2 to 3 times P99 refresh latency). For medium temperature keys (100 to 1,000 RPS), use longer soft TTL (2 to 5 minutes) to reduce refresh frequency and write amplification. For cold tail keys (under 10 RPS), use longest TTL (10+ minutes) since refresh cost is amortized over few requests. Apply 10 to 20 percent TTL jitter universally and enable probabilistic early refresh with beta equals 1.5. Set per key concurrency limit to 1 and global refresh concurrency cap based on origin capacity (e.g., if database handles 1,000 QPS, cap total cache refresh concurrency to 500 to leave headroom for direct queries).
Essential Observability
Observability is non negotiable. Track per key metrics including hit rate, soft hit rate (served stale via SWR), miss rate, refresh latency (P50, P95, P99), lock contention rate (percentage of lease acquisition failures), refresh error rate, and staleness duration (time served stale before successful refresh). Aggregate metrics across key temperature tiers: hot, warm, cold. Alert on anomalies such as sudden drop in hit rate (indicates mass invalidation or TTL misconfiguration), spike in lock timeouts (lease holders crashing or hanging), elevated refresh latency (origin degradation), or prolonged staleness (refresh failures). Implement distributed tracing to correlate user requests with cache operations and origin calls.
Incident Diagnosis
During incidents, detailed metrics enable rapid diagnosis: if lock timeout rate spikes, investigate lease holder health; if refresh error rate climbs, check origin service; if staleness duration exceeds hard TTL minus soft TTL, validate SIE fallback logic. Correlation between metrics reveals root cause: lock timeout spike plus elevated refresh latency indicates lease holders waiting on slow origin; refresh error rate spike with normal latency indicates origin returning 5xx errors.