Failure Modes: Lock Leakage, Cold Cache, and Hot Key Attacks
Lock Leakage
Production cache stampede mitigations fail in predictable ways that require explicit defenses. Lock leakage occurs when a lease holder crashes, hangs, or experiences network partition after acquiring a per key lock but before completing refresh and releasing the lock. If lock Time To Live (TTL) is set too conservatively (e.g., 5 seconds for a typical 200ms refresh), all subsequent requests block for up to 5 seconds waiting for a lock that will never be released by the crashed holder. If lock TTL is too aggressive (e.g., 150ms for 200ms P99 refresh), the lock expires before refresh completes, allowing a second requester to acquire the lock and duplicate work. The solution is fencing tokens: each lease includes a monotonic version number; the cache accepts writes only from the current lease holder and rejects late writes from expired leases. Additionally, implement backup writer logic where followers wait for lock TTL but then acquire a new lease if the original holder fails, with jittered retry backoff to prevent thundering retry herds.
Cold Cache Scenarios
Cold cache scenarios after restarts, deployments, or failures create massive synchronized stampedes because all keys are missing simultaneously instead of expiring individually over time. A fleet restart brings up 1,000 empty cache instances; the first request wave causes cache misses on every single key, potentially generating millions of origin requests in seconds. Mitigation requires multi layered defense. First, implement cache warming: before serving traffic, preload critical hot keys from origin or a backup cache tier. Second, use progressive traffic ramping: gradually increase request rate over 5 to 10 minutes rather than instant full traffic. Third, deploy request shedding with quality of service (QoS) prioritization: under extreme origin load, drop low priority requests (e.g., background analytics) while preserving high priority user facing requests. Fourth, combine with Stale While Revalidate (SWR): maintain a secondary persistent cache (Redis, disk) that survives restarts and serves as stale fallback during cold start.
Hot Key Attacks
Hot key attacks, whether malicious or organic (viral content, breaking news), can overwhelm even well designed systems. An attacker repeatedly requests a specific key with low TTL or uses cache busting parameters to force misses. A viral post can spike from 100 RPS to 100,000 RPS in seconds. Defenses include per key concurrency caps (e.g., maximum 1 in flight refresh per key regardless of request volume), token bucket rate limiting at the origin for refresh operations (e.g., limit total refresh QPS to 1,000 even if 10,000 keys need refresh), and serving stale or default fallback content under extreme pressure. Implement hot key detection: track per key request rates and automatically extend TTLs or switch to dedicated refresh workers for keys exceeding thresholds (e.g., above 10,000 RPS). For truly critical keys, use replication: cache multiple copies across nodes to increase hit parallelism without increasing origin load.