What is Distributed Caching and Why Use It?

Definition
Distributed caching places a shared in memory data layer between application servers and backend storage. The cache is spread across multiple nodes, each holding a portion of total cached data, allowing horizontal scaling of memory capacity and request throughput.
Why Distributed Caching Matters
Database queries take 10-100ms; cache lookups complete in under 1ms at p50 (median) and 1-2ms at p99 (99th percentile). This 10-100x speedup transforms expensive database operations into fast memory lookups. Moving from 90% to 99% hit rate reduces database traffic by 10x because 9 of 10 misses become hits. At scale, even small drops matter: a 2% hit rate drop at millions of QPS translates to hundreds of thousands of extra backend queries per second.
How Distributed Caching Scales
Keys are partitioned across nodes using consistent hashing with virtual nodes (typically 100-500 per physical node). Adding cache servers increases total memory and throughput linearly. Replication within the cache uses per partition primaries with 1-2 replicas for high availability and read fan out. Reads can be served from any replica; writes go to the primary.
Consistency Trade offs
Distributed caches accept bounded staleness in exchange for throughput and availability. Most systems prefer eventual consistency with short TTLs (Time To Live) rather than strong consistency, because strict coherence requires synchronous invalidation across all nodes, adding latency and reducing availability during partitions. Freshness is controlled through TTL expiration, explicit invalidation, or write propagation strategies like cache aside, read through, and write through.

💡 Key Takeaways

✓Cache hits are 10-100x faster than database: sub 1ms p50, 1-2ms p99 vs 10-100ms for database queries.

✓Hit ratio is critical: 90% to 99% reduces backend load by 10x. A 2% drop at millions of QPS adds hundreds of thousands of backend queries per second.

✓Horizontal scaling via consistent hashing with 100-500 virtual nodes per physical node. Adding servers increases memory and throughput linearly.

✓Replication uses per partition primaries with 1-2 replicas for availability and read fan out.

📌 Interview Tips

1Explain the hit ratio math: 90% hit rate means 10% miss rate. Going to 99% hit rate (1% miss) reduces misses by 10x.

2Know scale numbers: sub 1ms p50, 1-2ms p99 for cache vs 10-100ms for database. 10-100x speedup.

3Consistency trade off: eventual consistency with short TTLs preferred over strong consistency for throughput and availability.

← Back to Distributed Caching Overview