Database DesignNormalization vs DenormalizationMedium⏱️ ~3 min

Production Fan-out Strategies: Write vs Read Time Materialization

Duplicated Fields Pattern

The simplest denormalization: copy frequently-accessed fields from related tables into the row that needs them. An orders table might duplicate customer_name and customer_email from the customers table. Now displaying an order requires no join. The trade-off: when a customer updates their name, you must update it in every order row. For 1,000 orders, that is 1,000 writes. Use this when the duplicated data rarely changes and reads vastly outnumber writes.

Precomputed Aggregates Pattern

Instead of computing aggregates at query time, precompute and store them. A product catalog might store avg_rating and review_count directly on the product row rather than joining to reviews and computing AVG() and COUNT(). Displaying 10,000 products in a listing would otherwise require scanning millions of review rows. Update the aggregate incrementally: on new review, new_avg = (old_avg * old_count + new_rating) / (old_count + 1). Edge case: deleting reviews requires inverse computation or periodic full recomputation.

Materialized Views Pattern

A materialized view is a query result stored as a table. An analytics dashboard showing daily sales by region might query: SELECT region, date, SUM(amount) FROM orders GROUP BY region, date. Running this on 100 million order rows takes minutes. Materialize it: store the result in a daily_sales_by_region table. Dashboard queries hit the small materialized table (365 days × 50 regions = 18,250 rows) in < 10 ms. Refresh the view on schedule (hourly, daily) or incrementally via triggers.

Embedded Documents Pattern

Document databases enable embedding related data as nested objects. Instead of separate users and addresses tables requiring a join, embed addresses as an array inside the user document. One read fetches everything. Works well when embedded data is bounded (a user has at most 5-10 addresses) and always accessed together. Avoid embedding unbounded data (all user orders) as documents grow without limit, hurting read and write performance.

💡 Key Takeaways
Fan-out-on-write for average users materializes feed rows immediately: at 2,000 posts per second with 300 average followers and 3 replicas, sustains 1.8 million write operations per second but keeps reads at single digit milliseconds
Celebrity accounts create write storms: 1% of posts from 1 million follower accounts would generate 20 billion writes per second spikes, requiring fan-out-on-read or partial materialization above thresholds like 10,000 to 100,000 followers
Hybrid approach trades read latency for write stability: celebrity content adds 30 to 50 milliseconds at read time for merging, but affects fewer queries than a write storm affecting all storage
Write amplification ratio (derived writes divided by source writes) is the key metric: target staying under 1000x for p99 cases; breaches indicate need for fan-out strategy adjustments
Partial fan-out optimizes further: materialize only for active recent followers (logged in within 7 days), reducing write volume by 40 to 60 percent while keeping most engaged users on fast path
📌 Interview Tips
1Meta Instagram: users with under 10,000 followers use full fan-out-on-write (instant feed updates, 5ms reads); accounts above 100,000 followers switch to fan-out-on-read with merge queries adding 30 to 80 milliseconds but avoiding billions of write operations per celebrity post
2Twitter timeline architecture: hybrid model where average users get materialized timelines, but verified accounts with millions of followers store only recent tweets in a hot cache and merge at read time, keeping write throughput under 500,000 operations per second during peaks
← Back to Normalization vs Denormalization Overview