Cache Pattern Failure Modes: What Breaks in Production

Thundering Herd: Synchronized Stampede
Thundering herd (also called cache stampede) occurs when many clients simultaneously experience a cache miss on the same key and all attempt to fetch the backing data at once. This happens when a TTL expires, during mass invalidation events, or after a cache cold start. Consider a hot key serving 10,000 requests per second. When its TTL expires, every application server simultaneously detects the miss and issues a database query. The database, designed to handle perhaps 100 queries per second with cache absorbing the rest, suddenly receives 10,000 concurrent requests. Response times degrade from milliseconds to seconds. Connections exhaust. Cascading timeouts propagate upstream.
Mitigating Thundering Herds
Mitigation requires breaking the synchronization. Lease tokens ensure only one requester fetches while others wait for the result: on miss, acquire a per key lease with short TTL (5-10 seconds); if lease already held, wait briefly then retry cache get. Probabilistic early refresh triggers background updates before TTL expiry with random timing. TTL jitter (randomizing expiry by 10-20%) prevents fleet wide synchronization where all servers expire the same key at the same moment.
Stale Read Race Condition
The naive update cache on write pattern creates a subtle race condition. Thread A reads stale data from database during a slow query taking 100ms. While Thread A waits, Thread B writes new data to both database and cache. Thread A completes and overwrites the cache with its stale result. Now the cache serves incorrect data until TTL expiry. This is why production systems use delete on write: write to database, then delete the cache key, forcing the next reader to fetch fresh data. Combined with version stamps (each cached value carries a version number), the cache can reject writes of older versions.
Write Back Durability Failure
In write back systems, the cache node holds uncommitted writes in a buffer. If the node crashes before flushing to the database, those writes vanish. A 5 second flush interval means up to 5 seconds of data loss per node failure. For counters receiving 1,000 increments per second, that is 5,000 lost increments. Mitigation: replicate the write buffer across nodes before acknowledging, persist to a local WAL (Write Ahead Log) that survives crashes, and use idempotent writes so replay is safe.
Cold Cache Cascade
After a deployment, restart, or cache failure, the cache is empty. Every request misses and hits the database. If the database cannot handle full load without cache (often true since cache absorbs 80-95% of reads), it overloads and the system fails to recover. Solutions: cache warming (preloading critical hot keys before serving traffic), gradual traffic ramp up over 5-10 minutes, and request shedding (dropping low priority requests under extreme load while preserving high priority requests).
Key Trade-off: No single mitigation is sufficient. Production systems layer multiple protections: lease tokens for thundering herds, version stamps for stale reads, WAL for write back durability, and warming for cold starts. Defense in depth is essential.

💡 Key Takeaways

✓Thundering herd: hot key expiry causes simultaneous database queries from all servers; mitigate with leases, TTL jitter, and early refresh

✓Stale read race: slow reader can cache old data after concurrent write; use delete on write with version stamps to reject stale updates

✓Write back data loss: buffer flush interval equals maximum data loss window; mitigate with buffer replication, WAL, and idempotent writes

✓Cold cache cascade: empty cache after restart overloads database; use cache warming, gradual traffic ramp, and request shedding

✓TTL jitter (10-20% randomization) prevents synchronized expiry across fleet where all servers expire same key simultaneously

✓Defense in depth: layer multiple protections; no single mitigation handles all failure modes

📌 Interview Tips

1Thundering herd math: 10K requests/sec on hot key, TTL expires, all 10K servers query DB designed for 100 queries/sec; DB melts down

2Lease token pattern: on miss, acquire lease with 5s TTL; if held, wait 50ms and retry cache get; ensures single loader per key

3Stale write prevention: cache.set only if incomingVersion > cachedVersion; Thread A with version 5 rejected when cache already has version 7

← Back to Cache Patterns (Aside, Through, Back) Overview