Read Heavy Optimization: Precomputation and Locality

Read Optimization Principles
Read optimization centers on two principles: move data closer to users and compute expensive operations ahead of time. This minimizes work on the critical read path, trading storage and staleness for dramatically lower latency and cost per read. A well-optimized read-heavy system serves 95%+ of requests from cache with < 10 ms latency.
Multi-Layer Caching
Locality optimization employs multiple cache layers. A typical architecture flows: browser cache → CDN (Content Delivery Network, servers at the network edge) → regional cache → origin database. Each layer absorbs traffic before it reaches the next. A single in-memory cache node can handle 1-2 million operations/sec. CDNs serve static content with < 10 ms p50 latency by serving from memory at the edge. Origin fallbacks push p99 to tens of ms, but cache hit rates above 95% keep this rare.
Precomputation and Materialized Views
Precomputation avoids expensive joins and aggregations on the hot path. Instead of joining 5 tables to render a page, precompute and store the combined result. A timeline feature can use fan-out-on-write: when a user posts, immediately write to each follower inbox rather than computing the timeline on read. This converts a complex multi-table query into a simple range scan. Update materialized views synchronously for small data sets, asynchronously via change stream consumers for large fan-out.
Trade-offs of Read Optimization
The costs: cache invalidation complexity, replication lag causing staleness (typically seconds to minutes), and write amplification. Each additional index or materialized view adds 1-3x write overhead. Monitor replication lag using LSN (Log Sequence Number, the position in the write-ahead log). Implement read-after-write consistency when needed by routing the writer to the primary for a brief period or requiring replicas to catch up before serving.

💡 Key Takeaways

✓Multi layer caching achieves 95%+ hit rates with p50 latency under 10ms: client cache (seconds TTL) to regional cache (Redis at 1-2M ops/sec) to origin (10-50ms)

✓Materialized views convert expensive joins into simple lookups: Twitter timeline fanout on write makes reads constant time instead of aggregating from thousands of followed users

✓Replication lag trade off: asynchronous replicas keep write latency low but introduce seconds to minutes of staleness, requiring session pinning for read after write consistency

✓Write amplification cost: each additional index or materialized view adds 1x to 3x write overhead, must validate impact on write throughput and storage

✓Edge CDN deployment serves over 90% of Netflix traffic from local caches with sub 10ms RTT, reducing origin load by orders of magnitude

📌 Interview Tips

1Meta TAO uses cache invalidation or updates on writes with asynchronous replication within regions, precomputing relationship edges to avoid multi hop graph traversals

2Twitter hybrid fanout: most users get fanout on write (O(1) reads), celebrities trigger fanout on read to avoid writing to 30 million follower inboxes simultaneously

3Netflix Open Connect edge appliances serve tens of Gbps each with sub 10ms RTT, caching video content locally to avoid cross ISP transit for 95%+ of bytes

← Back to Read-Heavy vs Write-Heavy Optimization Overview