What is Distributed Caching and Why Use It?
Why Distributed Caching Matters
Database queries take 10-100ms; cache lookups complete in under 1ms at p50 (median) and 1-2ms at p99 (99th percentile). This 10-100x speedup transforms expensive database operations into fast memory lookups. Moving from 90% to 99% hit rate reduces database traffic by 10x because 9 of 10 misses become hits. At scale, even small drops matter: a 2% hit rate drop at millions of QPS translates to hundreds of thousands of extra backend queries per second.
How Distributed Caching Scales
Keys are partitioned across nodes using consistent hashing with virtual nodes (typically 100-500 per physical node). Adding cache servers increases total memory and throughput linearly. Replication within the cache uses per partition primaries with 1-2 replicas for high availability and read fan out. Reads can be served from any replica; writes go to the primary.
Consistency Trade offs
Distributed caches accept bounded staleness in exchange for throughput and availability. Most systems prefer eventual consistency with short TTLs (Time To Live) rather than strong consistency, because strict coherence requires synchronous invalidation across all nodes, adding latency and reducing availability during partitions. Freshness is controlled through TTL expiration, explicit invalidation, or write propagation strategies like cache aside, read through, and write through.