CachingCache Stampede ProblemMedium⏱️ ~3 min

Per Key Request Collapsing with Leases and Mutexes

Per key request collapsing, also called lease based or mutex based cache refresh, ensures that only one request recomputes a given cache key at any moment while other concurrent requests either wait or receive stale data. When the first request detects a cache miss or expired entry, it acquires a lease (a temporary exclusive lock identified by a token). The lease holder becomes the sole authorized recomputer for that key; all other requests arriving during recomputation either block with timeout and backoff, or immediately receive stale content if Stale While Revalidate (SWR) is enabled. Meta (Facebook) deployed this pattern at massive scale in their distributed cache tier (Memcache), handling millions of cache operations per second and explicitly preventing hot key stampedes. Implementation requires careful lock TTL tuning. The lock TTL must exceed typical recomputation time but remain short enough to recover from crashed lease holders. For example, if cache refresh typically completes in 200 milliseconds at the 99th percentile (P99), setting lock TTL to 500 to 600 milliseconds provides safety margin while limiting stuck lock duration. If a lease holder crashes, the lock expires after 500ms and a new requester can acquire it. Fencing tokens prevent the late arrival problem: each lease includes a monotonically increasing version number, and the cache only accepts writes from the current lease holder, rejecting stale writes from timed out or slow previous holders. The production impact is dramatic. For a hot key serving 20,000 RPS, naive caching creates 20,000 concurrent origin requests on expiry. With per key request collapsing, only 1 origin request executes while the remaining 19,999 requests either wait (typical wait time under 200ms) or immediately receive stale content with cache hit latency (single digit milliseconds). Origin load drops from 20,000 QPS to 1 QPS for that key. At Meta scale with keys serving millions of RPS, this pattern is the difference between stable operation and immediate outage.
💡 Key Takeaways
Meta Memcache implementation uses leases to collapse millions of requests per second on hot keys, reducing origin fanout from thousands of concurrent requests to exactly 1
Lock TTL tuning critical: set to 2 to 3 times P99 recompute latency (e.g., 500ms lock for 200ms P99 refresh) to balance recovery speed against duplicate work
Fencing tokens prevent lost update problem: monotonic version numbers ensure only current lease holder can write, rejecting late arrivals from timed out or slow previous holders
For 20,000 RPS hot key, request collapsing drops origin load from 20,000 QPS spike to 1 QPS while followers wait ~200ms or receive stale data in single digit milliseconds
Lease acquisition failure handling essential: followers must implement exponential backoff with jitter (e.g., 10ms, 20ms, 40ms, 80ms delays) to prevent thundering retry storms
Lock leakage risk: if lease holder crashes before releasing lock, refresh blocked until lock TTL expires; too long blocks refresh, too short causes duplicate recomputation and write conflicts
📌 Examples
Meta production: Hot user profile key at 500,000 RPS. On expiry, single lease holder queries database while 499,999 followers receive stale profile from cache with 2ms P99 latency instead of 150ms database query latency.
Microservice implementation in Go using Redis: SET NX (set if not exists) with EXPIRE for lease. First writer sets 'refresh:user:12345' with 500ms TTL. Followers check key existence; if present, either sleep 50ms and retry or read stale from 'stale:user:12345' backup key.
E-commerce product cache: 30,000 RPS product detail page. Lease holder refreshes inventory, pricing, and reviews (3 database queries, 180ms total). During refresh, 5,400 follower requests (30k RPS × 0.18s) either wait or receive stale product data, preventing 16,200 database queries (3 queries × 5,400 requests).
← Back to Cache Stampede Problem Overview