Caching • Eviction PoliciesHard⏱️ ~3 min
Approximate Eviction and Implementation at Scale
Exact eviction policies like true LRU or LFU require global ordering structures (doubly linked lists, frequency buckets) that become concurrency bottlenecks at high request rates. A single global LRU list requires locking on every access to reorder nodes, and at millions of queries per second, lock contention dominates CPU cost and increases tail latency. Approximate eviction algorithms trade small accuracy loss for massive performance gains by using sampling, sharding, and probabilistic data structures to avoid expensive global synchronization. Redis exemplifies this approach: on eviction, Redis samples a small random set of candidates (typically 5 to 10 keys) from the eligible keyset and evicts the least recent or least frequent among them. This sampled LRU or LFU avoids maintaining a global ordering and yields hit ratios within a few percentage points of exact algorithms under typical power law workloads.
Sharding the cache by key hash is another critical technique for scalability. Instead of a single global cache structure, the keyspace is partitioned into independent shards (for example, 64 or 128 shards), each with its own policy metadata and locks. Concurrent requests to different shards proceed in parallel without contention. Tuning the shard count balances parallelism against metadata overhead: too few shards create lock contention, too many inflate memory cost. Per shard lock queues should remain short at peak request rates; if p99 lock wait exceeds a few microseconds, increase shard count. For global admission structures like TinyLFU Count Min Sketch, use striped sketches (per shard sketches with periodic merges) to avoid hot locks on the shared sketch.
Approximate eviction is essential in high throughput production systems. Meta Memcache uses per slab class approximate LRU with background LRU crawler threads that scan and expire idle or old items asynchronously, decoupling eviction cost from the critical request path. Redis approximate LRU/LFU with sampling keeps eviction CPU cost predictable at millions of operations per second. Count Min Sketch in TinyLFU approximates frequencies with bounded error (under 1% to 2%) in a few MB, enabling admission filtering without per key frequency counters. When evaluating approximate policies, measure the hit rate delta against exact algorithms using trace replays of production traffic, then quantify backend load savings. A 1% hit rate loss that reduces eviction CPU by 50% is often a favorable trade, especially if the backend can absorb the incremental load. Operational monitoring should track eviction CPU time, lock contention (lock wait percentiles), hit ratio by time window, and miss penalty to detect when approximation error becomes material.
💡 Key Takeaways
•Exact LRU/LFU require global ordering structures that become concurrency bottlenecks at high request rates. Locking a global LRU list on every access dominates CPU and increases tail latency at millions of queries per second.
•Approximate eviction via sampling: on eviction, sample a small random set of candidates (5 to 10 keys) and evict the worst by policy. Redis uses this for LRU/LFU, yielding hit ratios within a few percent of exact algorithms under power law workloads.
•Sharding by key hash partitions the cache into independent shards (64 or 128 typical), each with separate policy metadata and locks. Concurrent requests to different shards proceed without contention. Tune shard count so per shard lock queues remain short (p99 lock wait under a few microseconds).
•Count Min Sketch for TinyLFU approximates frequencies with bounded error (under 1% to 2%) in a few MB of memory, enabling admission filtering without per key counters. Use striped sketches (per shard) to avoid hot locks on a shared global sketch.
•Meta Memcache decouples eviction from the request path using background LRU crawler threads that asynchronously scan and expire idle items, keeping eviction cost off the critical latency path and predictable under load.
•Evaluate approximate policies via trace replays: measure hit rate delta versus exact algorithms and quantify backend load savings. A 1% hit rate loss that cuts eviction CPU by 50% is often favorable if backend can absorb incremental load.
📌 Examples
Redis approximate LRU with 10 sample candidates achieves hit ratios within 2% of exact LRU under Zipfian workloads, while reducing eviction CPU cost by over 90% and enabling sustained millions of operations per second with sub millisecond p99 latency.
A production service shards its cache into 128 partitions by key hash. Each shard maintains independent SLRU metadata. Lock contention drops from p99 50 microseconds with a single global lock to p99 2 microseconds with sharding, enabling 10x throughput increase with the same hardware.