Design Fundamentals • Back-of-the-envelope CalculationsMedium⏱️ ~3 min
Fan Out Calculations and Write Amplification Trade Offs
Fan out represents write amplification when a single user action triggers multiple downstream writes or reads. In social media systems, this manifests in two architectural patterns: fan out on write (precompute and push to follower feeds) and fan out on read (compute timeline on demand by pulling from followees). The calculations dramatically differ. For fan out on write, if each post reaches 500 followers on average and your system handles 1 million posts per day, you generate 500 million feed insertions daily. This translates to approximately 5,800 insertions per second on average, but regional spikes can create 5x peaks reaching 29,000 insertions per second. If each feed entry consumes 150 bytes for IDs, timestamps, and pointers, raw storage grows by 75 GB per day, becoming 225 GB per day with 3x replication or 82 TB per year.
Fan out on read eliminates this write amplification but moves the cost to read time. When a user requests their feed, the system must query posts from all followees (say 200 people), rank them, and return top results. If average query latency per followee is 5ms and queries run in parallel across 10 shards, you still need approximately 10ms for data retrieval plus 20 to 40ms for ranking and filtering. This seems reasonable until a celebrity with 50 million followers posts: under fan out on write, you face 50 million feed insertions; under fan out on read, every one of those 50 million followers incurs a query that must fetch and rank the celebrity's recent posts. The read spike can overwhelm databases and caches.
Production systems use hybrid approaches. Twitter and Facebook historically fan out on write for normal users but fan out on read for celebrities and high follower accounts, with thresholds around 1 to 10 million followers. This keeps write amplification bounded while avoiding read time spikes. The calculation to choose: estimate (posts per day × average followers) for write cost versus (timeline reads per day × average followees × query cost) for read cost. If write amplification is 1000x but timeline reads are 10x more frequent, fan out on read may be cheaper. However, if p99 read latency SLOs are tight (under 200ms), the variability of fan out on read during celebrity events may violate SLOs, forcing precompute despite higher write and storage costs.
💡 Key Takeaways
•Fan out on write amplification: 1 million posts per day with 500 followers each generates 500 million feed insertions daily (5,800 per second average, 29,000 per second peak), consuming 225 GB per day with 3x replication
•Storage growth for precomputed feeds: 150 bytes per feed entry times 500 million insertions per day equals 75 GB raw, becoming 82 TB per year with replication before considering feed pruning and retention policies
•Fan out on read cost: user timeline request queries 200 followees with 5ms per query, requiring 10ms for parallel shard queries plus 20 to 40ms for ranking, but celebrity posts create read spikes across millions of followers
•Hybrid strategy thresholds: Twitter and Facebook fan out on write for normal users, switching to fan out on read for accounts exceeding 1 to 10 million followers to bound write amplification
•Decision formula: compare (posts per day times average followers) for write cost against (timeline reads per day times average followees times query cost) for read cost, factoring in latency SLO sensitivity
•Celebrity event failure mode: single celebrity with 50 million followers posting creates either 50 million feed insertions (fan out on write) or 50 million cache and database queries as followers refresh (fan out on read)
📌 Examples
Instagram feed calculations: 100 million posts per day with average 300 followers equals 30 billion feed insertions per day (347,000 per second average). At 200 bytes per entry, raw storage is 6 TB per day. With 3x replication and 30 percent metadata overhead, net growth is approximately 23 TB per day or 8.4 PB per year.
Reddit comment thread fan out: Popular post receives 10,000 comments. With nested threading, average comment has 3 parent references requiring index updates. Total index writes are 10,000 comments times 3 parent updates equals 30,000 index operations. If post receives 500,000 views and each view triggers comment tree fetch across 5 database shards, read amplification is 500,000 times 5 equals 2.5 million queries.
LinkedIn connection suggestions: When user connects with someone new, system recomputes mutual connection counts for both users' networks. If each user has 500 connections, one new connection triggers updates for up to 1,000 user records. At 1 million new connections per day, that is 1 billion record updates daily or 11,600 updates per second average.