CachingDistributed CachingMedium⏱️ ~3 min

Cache Eviction Policies, Admission Control, and Capacity Planning

LRU and Its Limitations

LRU (Least Recently Used) removes the oldest accessed key, favoring temporal locality. Works well for many workloads but vulnerable to cache pollution from scan traffic or one hit wonders: a burst of unique keys accessed once then never again can evict valuable hot keys, dropping hit rate by 10-20% in production.

TinyLFU Admission Policy

TinyLFU (Tiny Least Frequently Used) tracks key access frequency using a compact counting sketch (probabilistic data structure). Only admits new keys if they are accessed more frequently than the victim candidate, preventing cold keys from polluting cache. Combines with segmented LRU to achieve near optimal hit rates with minimal memory overhead. At hundreds of GB cache size, even 5% hit rate improvement reduces backend load by tens of thousands of QPS.

Negative Caching and Size Awareness

Negative caching stores the fact that a key does not exist (cache the miss) with very short TTL (under 10 seconds) to prevent repeated backend queries for non existent data. Longer TTLs risk masking newly created data. Size aware eviction accounts for object size: policies like GDSF (Greedy Dual Size Frequency) balance size, frequency, and recency. Maximum object size limits (1-10 MB) prevent large objects from dominating cache and evicting many hot small keys.

Capacity Planning

Estimate working set from traffic traces: measure p95/p99 of unique keys over sliding windows (1-24 hours), multiply by average object size plus 20-30% overhead for metadata and fragmentation. Example: 200K RPS with 1 KB average values targeting 99% hit rate needs 400-600 GB after replication. Maintain 20-30% free headroom to limit eviction churn under traffic spikes.

💡 Key Takeaways
LRU vulnerable to scan pollution: one hit wonders evict hot keys, dropping hit rate by 10-20%.
TinyLFU admission tracks frequency with counting sketch, only admitting keys more popular than victims. 5% hit rate gain = tens of thousands less QPS to backend.
Negative caching with short TTL (under 10s) prevents repeated queries for non existent keys. Longer risks masking new data.
Capacity planning: p95/p99 unique keys × average size × 1.2-1.3 overhead. Maintain 20-30% free headroom.
📌 Interview Tips
1Scan pollution: batch job reads 1M keys once, evicting working set. TinyLFU rejects low frequency keys, protecting hot data.
2Negative cache: API returns 404 for non existent user, cache the miss for 5s to prevent hammering database.
3Capacity math: 200K RPS, 1KB values, 99% target hit rate, working set ~400-600 GB including replication and overhead.
← Back to Distributed Caching Overview