Production Caching: Multi Tier Architecture and Pattern Composition

Multi Tier Cache Architecture
Large scale deployments rarely use a single cache. A common production architecture has three tiers. L1 is an in process cache (local to each application server) with sub-microsecond access (50-100μs) but limited to a single server and lost on restart. L2 is a distributed cache (shared across all servers) with 0.5-2ms access, surviving individual node restarts. L3 is the source of truth database with 5-50ms access. The read path checks L1 first, then L2, then L3. A request might hit L1 (50μs response), miss L1 but hit L2 (2ms response), or miss both and query L3 (30ms response plus cache population).
Combined Hit Ratio Math
L1 hit ratios of 30-50% combined with L2 hit ratios of 80-90% means only 5-15% of requests reach the database. Consider: 100 requests arrive; 40 hit L1 (40%), 60 miss L1; of those 60, 51 hit L2 (85% of misses), 9 miss L2; only 9 of 100 original requests query the database. For a service handling 100,000 requests per second, that is 9,000 database queries instead of 100,000, reducing database load by 91%.
Cross Zone Replication for Availability
Distributed caches at scale replicate across availability zones (separate physical data centers within a region) with replication factor of 2-3. This ensures a single zone failure does not cause cache misses to flood the database. Within a zone, hit latency is sub-millisecond. Cross zone replication adds 1-2ms for acknowledgment but ensures durability. Client side consistent hashing distributes keys across nodes; automatic failover redirects traffic away from unhealthy nodes.
Pattern Composition by Data Type
Production systems mix patterns based on data characteristics. Hot user facing data (profiles, settings, sessions) uses cache aside with write through for read your writes consistency. High volume background data (analytics events, logs) uses write around to prevent cache pollution. Counters and aggregates (view counts, like counts) use write back for batching efficiency where eventual consistency is acceptable. The decision framework: How often is data written? How often is it read immediately after writing? How critical is immediate consistency?
Observability Requirements
You cannot operate caches at scale without metrics. Track hit ratio globally and per key pattern; alert on drops below 80-90%. Monitor p50, p95, p99 cache latency to detect hotspots. Measure miss penalty (time to fetch from source) to quantify cache value. Track eviction rate to identify sizing problems. For write patterns, monitor write queue lag and invalidation success rate. Size memory for target hit ratio with 20-30% headroom to prevent eviction cascades under traffic spikes.
Key Insight: Production caching combines multi tier architecture, cross zone replication, and mixed patterns by data type. No single pattern fits all data. Monitor hit ratios, latencies, and eviction rates to detect problems before they become outages.

💡 Key Takeaways

✓Production systems layer L1 in process (50-100μs), L2 distributed (0.5-2ms), and L3 database (5-50ms) caches

✓Combined L1 30-50% and L2 80-90% hit rates mean only 5-15% of requests reach the database, reducing load by 85-95%

✓Cross zone replication (factor 2-3) ensures zone failures do not flood database with cache misses; adds 1-2ms latency

✓Mix patterns by data type: cache aside + write through for user data, write around for logs, write back for counters

✓TTL tuning varies by data: seconds to minutes for dynamic data, hours for immutable; jitter prevents synchronized expiry

✓Must track hit ratio (target 80-95%), latency percentiles, eviction rate, and miss penalty; alert on deviations

📌 Interview Tips

1Hit ratio calculation: 100 requests, 40% L1 hit = 40 served; remaining 60, 85% L2 hit = 51 served; only 9 reach database (9% miss rate)

2Pattern selection: user profile (hot, immediate re-read) → write through; analytics event (cold, never re-read) → write around; view counter → write back

3Observability alert: hit ratio drops from 92% to 78% in 10 minutes; investigate eviction cascade, TTL misconfiguration, or traffic pattern change

← Back to Cache Patterns (Aside, Through, Back) Overview