Approximate Eviction and Implementation at Scale
The Concurrency Problem
Exact eviction policies like true LRU require global ordering structures (doubly linked lists, frequency buckets) that become bottlenecks at high request rates. A single global LRU list requires locking on every access to reorder nodes. At millions of queries per second, lock contention dominates CPU cost and increases tail latency. Approximate algorithms trade small accuracy loss for massive performance gains.
Sampled Eviction
On eviction, sample a small random set of candidates (5-10 keys) and evict the worst by policy (least recent or least frequent). This avoids maintaining global ordering and yields hit ratios within a few percentage points of exact algorithms under power law workloads. Sampling with 10 candidates achieves hit ratios within 2% of exact LRU while reducing eviction CPU by 90%.
Sharding for Parallelism
Partition the keyspace into independent shards (64-128 typical), each with its own policy metadata and locks. Concurrent requests to different shards proceed without contention. Tune shard count so per shard lock queues stay short: p99 latency (the 99th percentile, where 99% of requests complete faster) under a few microseconds. Too few shards create contention; too many inflate metadata overhead. Sharding drops p99 lock wait from 50 us (single lock) to 2 us (128 shards).
Background Eviction
Decouple eviction from the request path using background crawler threads that scan and expire idle items asynchronously. This keeps eviction cost off the critical latency path. Evaluate approximate policies via trace replays: a 1% hit rate loss that reduces eviction CPU by 50% is often favorable. Monitor eviction CPU time, lock contention percentiles, and hit ratio by time window.