Caching • Eviction PoliciesHard⏱️ ~3 min
Size and Cost Aware Eviction: Optimizing for Variable Objects and Expensive Misses
Most eviction policies treat all cached items as equal regardless of size or fetch cost, optimizing for request hit ratio (the fraction of requests served from cache). However, in systems with variable sized objects or highly variable miss penalties, request hit ratio can be a misleading metric. A cache that retains one 10 MB object while evicting 100 small 10 KB objects may achieve low request hit ratio but excellent byte hit ratio. Conversely, caching many small cheap to fetch objects at the expense of a few large expensive ones optimizes request hits but increases backend cost and latency. Size and cost aware eviction policies optimize byte hit ratio or cost savings by incorporating object size and fetch cost into eviction decisions.
GreedyDual Size Frequency (GDSF) is a representative size and cost aware policy commonly used in Content Delivery Networks (CDNs) and caches of machine learning features or media objects. Each cached item is assigned a priority score calculated as (frequency or recency value plus fetch cost) divided by size in bytes. The item with the lowest score is evicted first. This scoring function maximizes the utility (hits times cost saved) per byte of cache space consumed. For example, a 1 MB object accessed 10 times with 100 millisecond fetch cost has a different eviction priority than a 10 KB object accessed 100 times with 10 millisecond fetch cost, and the policy balances these trade offs explicitly. In practice, fetch cost can be estimated from backend latency percentiles, database query cost, or explicit cost annotations.
Implementing size aware policies requires careful normalization and fairness considerations. Without bounds, very large objects can be starved and never admitted, or conversely, one giant hot object can monopolize the cache. Practical systems often use hybrid designs: a small large object partition with separate eviction, cost normalization (for example, capping cost ratios), or size class based policies. Another approach is to track both request hit ratio and byte hit ratio and alarm on divergence. Additionally, size aware metadata must account for true memory footprint including overhead: a 10 million entry cache with average 200 byte values, 50 byte keys, and 32 bytes metadata consumes approximately 2.82 GB, and eviction decisions should consider total bytes not just entry count. Size aware policies are essential for CDNs, object storage caches, ML feature stores, and any system where object sizes span multiple orders of magnitude.
💡 Key Takeaways
•Size aware policies optimize byte hit ratio or cost savings per byte rather than request hit ratio. A cache with one 10 MB object and 100 evicted 10 KB objects has low request hit ratio but potentially excellent byte hit ratio.
•GreedyDual Size Frequency (GDSF) assigns priority as (frequency or recency value plus fetch cost) divided by object size. Evicting the lowest priority item maximizes utility (cost saved) per byte of cache space, balancing size, popularity, and expense.
•Fetch cost can be measured as backend latency (p50, p99), database query cost, or explicit annotations. For example, a 1 MB object with 100 millisecond fetch cost versus a 10 KB object with 10 millisecond cost are weighted differently in eviction.
•Naive size aware policies can starve large objects (never admitted) or allow one giant hot object to monopolize cache. Mitigations include size class partitions, cost normalization (capping ratios), or carving out a small large object region with separate policy.
•Metadata accounting must include true memory footprint. With 10 million entries, 200 byte average values, 50 byte keys, and 32 byte metadata, total memory is approximately 2.82 GB. Eviction should target total bytes, not just entry count.
•Size aware eviction is critical for CDNs (variable media object sizes), ML feature stores (features range from bytes to megabytes), and object storage caches. Systems like Varnish and Nginx use size aware scoring for cache management.
📌 Examples
Varnish (a widely used HTTP cache and CDN component) uses a size aware eviction algorithm that considers object size and access patterns. Varnish tracks byte hit ratio separately from request hit ratio and can be configured to optimize for byte efficiency, which is critical for reducing origin bandwidth costs in CDN deployments.
An ML feature serving cache stores features ranging from 1 KB (scalar features) to 10 MB (embedding vectors). Using plain LRU causes large embeddings to evict many small features, increasing backend queries. Switching to GDSF with cost proportional to fetch latency reduces backend load by 40% by retaining expensive large features and evicting cheap small ones.