CachingEviction PoliciesHard⏱️ ~3 min

Failure Modes and Edge Cases in Cache Eviction

Scan Pollution

A sequential scan or analytics job touching a huge keyset once can evict the entire working set under pure LRU, collapsing hit rate from 80% to near zero. The scan brings each key to MRU (most recently used) position, pushing hot items out. Mitigations: admission filters (TinyLFU rejects low frequency scan keys), segmented designs (scans flow through probation without disturbing protected), or routing scan traffic to a separate tier.

Thundering Herd on Expiration

When many keys expire simultaneously (synchronized TTL values from batch load or startup), all clients detect misses concurrently and issue backend queries, overwhelming the database. Adding jitter spreads expiration: TTL = base + random(0, 0.1 * base). With 100,000 keys and 1 hour TTL, synchronized expiration causes 100,000 queries in under 1 second. Adding 5 minute jitter spreads load to 333 queries per second.

Slab Fragmentation

Per size class allocation systems use fixed size slabs (64 byte, 128 byte, 256 byte). Hot growth in one class triggers aggressive eviction while free memory is stranded elsewhere. The 128 byte slab at 95% evicting while 256 byte slab sits at 30%. Rebalancing daemons help but lag traffic shifts, causing elevated evictions and tail latency spikes. Monitor per slab utilization and eviction rates.

Sketch Sizing and Stale Frequencies

In Count Min Sketch based admission, undersized sketches cause hash collisions that inflate frequency estimates, leading to false admissions. Width 10,000 depth 4 limits error to under 2%. Without periodic resets, old frequencies dominate and the policy stops adapting. In LFU without decay, once popular items retain high counts indefinitely (stale heavies), resisting eviction after patterns change.

💡 Key Takeaways
Scan pollution under pure LRU: sequential scans evict the entire working set, collapsing hit rate from 80% to near zero. Mitigate with admission filters or segmented designs.
Thundering herd on synchronized TTL: 100,000 keys with same expiration cause 100,000 queries per second. Adding 5 minute jitter spreads load to 333 QPS.
Slab fragmentation: hot growth in one size class triggers eviction while memory is stranded in others. Monitor per slab utilization.
Undersized Count Min Sketches cause hash collisions inflating frequency estimates. Width 10,000 depth 4 keeps error under 2%.
LFU without decay creates stale heavies: once popular items retain counts indefinitely, resisting eviction after patterns change.
📌 Interview Tips
1Explain scan pollution: scans push all keys to MRU position, evicting working set. Mention admission filters and segmented caches as mitigations.
2For thundering herd, give the math: 100,000 synchronized keys at 1 hour TTL cause 100,000 QPS spike. Jitter formula: TTL = base + random(0, 0.1 * base).
3Know slab fragmentation symptoms: high eviction in one class while others are underutilized. Solution: rebalancing daemons or dynamic slab allocation.
← Back to Eviction Policies Overview
Failure Modes and Edge Cases in Cache Eviction | Eviction Policies - System Overflow