Per Key Request Collapsing with Leases and Mutexes
Request Collapsing Concept
Instead of N concurrent requests each fetching from origin, collapse them into 1 fetch. One requester becomes the leader and fetches; others wait for the result and share it. This reduces origin load from N to 1 during stampede windows, completely eliminating the amplification problem. Implementation requires per key coordination mechanism to elect a single leader and hold all other waiters until the fetch completes.
Lease Based Collapsing
On cache miss, cache server issues a short lived lease token (typically 10-60 seconds) to the first requester, granting exclusive right to refill that key. Other requesters receive indication that refill is in progress and either wait for completion or receive stale data if available. When the leader writes to cache, it includes the lease token; cache validates the token matches before accepting the write. This prevents stale data from racing in after a later valid refill completes. If lease expires without successful write, the next requester gets a new lease.
Mutex/Lock Based Collapsing
Application acquires distributed lock using cache itself or separate coordination service before fetching from origin. Command: SET key:lock NX EX 30 acquires lock atomically with 30 second TTL. If lock acquired (returns OK), fetch from origin and write to cache. If not acquired (returns nil), either wait with exponential backoff and retry, or return stale data immediately. Lock TTL must exceed expected fetch latency with margin for retries and network variability.
Trade offs
Leases require cache server support for token tracking and validation. Mutex based works with any cache but requires separate lock key namespace and careful TTL tuning. Both approaches add latency for waiters (50-200ms typical). If leader fails mid fetch, waiters may timeout waiting. Set waiter timeout shorter than lock TTL to bound worst case latency while still giving leader time to complete.