Cache Eviction Policies, Admission Control, and Capacity Planning
LRU and Its Limitations
LRU (Least Recently Used) removes the oldest accessed key, favoring temporal locality. Works well for many workloads but vulnerable to cache pollution from scan traffic or one hit wonders: a burst of unique keys accessed once then never again can evict valuable hot keys, dropping hit rate by 10-20% in production.
TinyLFU Admission Policy
TinyLFU (Tiny Least Frequently Used) tracks key access frequency using a compact counting sketch (probabilistic data structure). Only admits new keys if they are accessed more frequently than the victim candidate, preventing cold keys from polluting cache. Combines with segmented LRU to achieve near optimal hit rates with minimal memory overhead. At hundreds of GB cache size, even 5% hit rate improvement reduces backend load by tens of thousands of QPS.
Negative Caching and Size Awareness
Negative caching stores the fact that a key does not exist (cache the miss) with very short TTL (under 10 seconds) to prevent repeated backend queries for non existent data. Longer TTLs risk masking newly created data. Size aware eviction accounts for object size: policies like GDSF (Greedy Dual Size Frequency) balance size, frequency, and recency. Maximum object size limits (1-10 MB) prevent large objects from dominating cache and evicting many hot small keys.
Capacity Planning
Estimate working set from traffic traces: measure p95/p99 of unique keys over sliding windows (1-24 hours), multiply by average object size plus 20-30% overhead for metadata and fragmentation. Example: 200K RPS with 1 KB average values targeting 99% hit rate needs 400-600 GB after replication. Maintain 20-30% free headroom to limit eviction churn under traffic spikes.