Production Caching: Multi Tier Architecture and Pattern Composition
Multi Tier Cache Architecture
Large scale deployments rarely use a single cache. A common production architecture has three tiers. L1 is an in process cache (local to each application server) with sub-microsecond access (50-100μs) but limited to a single server and lost on restart. L2 is a distributed cache (shared across all servers) with 0.5-2ms access, surviving individual node restarts. L3 is the source of truth database with 5-50ms access. The read path checks L1 first, then L2, then L3. A request might hit L1 (50μs response), miss L1 but hit L2 (2ms response), or miss both and query L3 (30ms response plus cache population).
Combined Hit Ratio Math
L1 hit ratios of 30-50% combined with L2 hit ratios of 80-90% means only 5-15% of requests reach the database. Consider: 100 requests arrive; 40 hit L1 (40%), 60 miss L1; of those 60, 51 hit L2 (85% of misses), 9 miss L2; only 9 of 100 original requests query the database. For a service handling 100,000 requests per second, that is 9,000 database queries instead of 100,000, reducing database load by 91%.
Cross Zone Replication for Availability
Distributed caches at scale replicate across availability zones (separate physical data centers within a region) with replication factor of 2-3. This ensures a single zone failure does not cause cache misses to flood the database. Within a zone, hit latency is sub-millisecond. Cross zone replication adds 1-2ms for acknowledgment but ensures durability. Client side consistent hashing distributes keys across nodes; automatic failover redirects traffic away from unhealthy nodes.
Pattern Composition by Data Type
Production systems mix patterns based on data characteristics. Hot user facing data (profiles, settings, sessions) uses cache aside with write through for read your writes consistency. High volume background data (analytics events, logs) uses write around to prevent cache pollution. Counters and aggregates (view counts, like counts) use write back for batching efficiency where eventual consistency is acceptable. The decision framework: How often is data written? How often is it read immediately after writing? How critical is immediate consistency?
Observability Requirements
You cannot operate caches at scale without metrics. Track hit ratio globally and per key pattern; alert on drops below 80-90%. Monitor p50, p95, p99 cache latency to detect hotspots. Measure miss penalty (time to fetch from source) to quantify cache value. Track eviction rate to identify sizing problems. For write patterns, monitor write queue lag and invalidation success rate. Size memory for target hit ratio with 20-30% headroom to prevent eviction cascades under traffic spikes.