CachingEviction PoliciesHard⏱️ ~3 min

Approximate Eviction and Implementation at Scale

The Concurrency Problem

Exact eviction policies like true LRU require global ordering structures (doubly linked lists, frequency buckets) that become bottlenecks at high request rates. A single global LRU list requires locking on every access to reorder nodes. At millions of queries per second, lock contention dominates CPU cost and increases tail latency. Approximate algorithms trade small accuracy loss for massive performance gains.

Sampled Eviction

On eviction, sample a small random set of candidates (5-10 keys) and evict the worst by policy (least recent or least frequent). This avoids maintaining global ordering and yields hit ratios within a few percentage points of exact algorithms under power law workloads. Sampling with 10 candidates achieves hit ratios within 2% of exact LRU while reducing eviction CPU by 90%.

Sharding for Parallelism

Partition the keyspace into independent shards (64-128 typical), each with its own policy metadata and locks. Concurrent requests to different shards proceed without contention. Tune shard count so per shard lock queues stay short: p99 latency (the 99th percentile, where 99% of requests complete faster) under a few microseconds. Too few shards create contention; too many inflate metadata overhead. Sharding drops p99 lock wait from 50 us (single lock) to 2 us (128 shards).

Background Eviction

Decouple eviction from the request path using background crawler threads that scan and expire idle items asynchronously. This keeps eviction cost off the critical latency path. Evaluate approximate policies via trace replays: a 1% hit rate loss that reduces eviction CPU by 50% is often favorable. Monitor eviction CPU time, lock contention percentiles, and hit ratio by time window.

💡 Key Takeaways
Exact LRU requires global ordering structures that become bottlenecks. Locking on every access dominates CPU at millions of QPS.
Sampled eviction: sample 5-10 candidates, evict the worst. With 10 samples, hit ratios stay within 2% of exact while reducing eviction CPU by 90%.
Sharding partitions keyspace into 64-128 independent shards. p99 (99th percentile) lock wait drops from 50 us to 2 us with 128 shards.
Background crawler threads decouple eviction from request path, keeping eviction cost off critical latency.
A 1% hit rate loss that reduces eviction CPU by 50% is often favorable. Evaluate via trace replays of production traffic.
📌 Interview Tips
1Explain concurrency problem: global LRU list requires locking every access, becoming bottleneck at millions of QPS.
2For sampled eviction, give numbers: 10 sample candidates, within 2% of exact hit ratio, 90% CPU reduction.
3Know sharding math: 128 shards drops p99 lock wait from 50 us to 2 us, enabling 10x throughput on same hardware.
← Back to Eviction Policies Overview
Approximate Eviction and Implementation at Scale | Eviction Policies - System Overflow