Caching • Eviction PoliciesMedium⏱️ ~3 min
Admission Policies and W-TinyLFU: Filtering Cache Pollution
Admission policies act as a gatekeeper, deciding whether to admit a candidate into the cache at all, even when space is available or can be created via eviction. Without admission control, one hit wonders and scan traffic pollute the cache by evicting valuable residents for items unlikely to be accessed again. An effective admission policy tracks approximate access frequency over a sliding time window and admits a candidate only if it appears hotter than the current eviction victim. This filtering dramatically reduces cache churn and improves hit rate under workloads with many cold or scanned items.
W-TinyLFU (Window Tiny Least Frequently Used) is a widely adopted admission and eviction design that combines three components. First, a Count Min Sketch (CMS) tracks approximate access frequencies for all recently seen keys in a space efficient probabilistic data structure. A CMS with width 10,000 and depth 4 uses roughly 40 KB to track frequencies for millions of keys with low error rates (under 1% to 2% relative frequency inflation due to hash collisions). The sketch is periodically reset or aged (for example, halving all counters) to bound stale history and keep the window recent. Second, a small window segment (1% to 2% of cache capacity) uses LRU to absorb bursty traffic without requiring frequency buildup. Third, the main cache is segmented into probation (20%) and protected (80%) regions, both using LRU ordering, to retain proven hot items and resist scans.
When a cache miss occurs, the candidate competes for admission. W-TinyLFU compares the candidate's estimated frequency from the CMS to the frequency of the eviction victim (the tail item in probation or window). If the candidate is hotter, it is admitted and the victim is evicted; otherwise, the candidate is rejected and the cache is unchanged. This approach reduces pollution from sequential scans and one hit wonders because such items have low frequency and lose the admission competition. In practice, W-TinyLFU improves hit rate by 10% to 30% over plain LRU at the same memory budget under heavy tailed workloads, which translates to massive backend savings at scale. The metadata cost is small: the CMS is typically a few MB even for tracking tens of millions of candidates, and per entry overhead remains similar to segmented LRU.
💡 Key Takeaways
•Admission policies prevent cache pollution by rejecting candidates unless they are hotter than the eviction victim. Without admission, one hit wonders and scans evict valuable residents, increasing miss rate and backend load.
•Count Min Sketch (CMS) approximates access frequencies for millions of keys in a space efficient probabilistic structure. A sketch with width 10,000 and depth 4 uses approximately 40 KB and provides frequency estimates with under 1% to 2% relative error due to hash collisions.
•W-TinyLFU combines three segments: a tiny window (1% to 2%) for bursty LRU admission, and main cache split into probation (20%) and protected (80%). Candidates compete against victims using CMS frequency; hotter candidates are admitted, colder ones rejected.
•The CMS is periodically reset or aged (for example, halving all counters every N accesses) to bound stale history and adapt to workload shifts. This prevents old frequencies from dominating and ensures the admission decision reflects recent behavior.
•Published evaluations show W-TinyLFU improves hit rate by 10% to 30% over plain LRU under heavy tailed workloads at the same memory budget. At 100,000 QPS, a 10% hit rate improvement saves 10,000 backend requests per second.
•Metadata cost is low: the CMS is a few MB for tens of millions of tracked candidates, and per entry overhead remains 24 to 48 bytes similar to segmented LRU. Total memory increase is under 1% for typical cache sizes.
📌 Examples
Caffeine, a high performance Java caching library, implements W-TinyLFU and is widely used in production at companies including Google, Netflix, and Amazon services. Trace driven simulations on real workloads show consistent hit rate improvements of 10% to 30% over LRU and 5% to 15% over plain SLRU.
A production service at 100,000 queries per second with 10 millisecond backend miss penalty and 20% miss rate incurs 20,000 backend requests per second and 200 seconds of backend latency per second. Deploying W-TinyLFU to reduce miss rate to 15% cuts backend load to 15,000 requests per second, saving 5,000 requests per second and 50 seconds of backend latency.