Database DesignRead Replicas & Query RoutingMedium⏱️ ~2 min

When NOT to Use Read Replicas: Alternatives and Tradeoffs

When Caching Beats Replicas

Read replicas are not universally optimal. When most traffic can be served from caches, adding replicas yields minimal benefit. If 95% of reads hit a CDN (Content Delivery Network—geographically distributed cache servers) or in-memory cache, your database handles only 5% of read load.

Scaling replicas from 1 to 5 reduces database load by 4% overall—marginal gain that does not justify operational complexity. Better to invest in smarter caching: longer TTLs (Time-To-Live—how long cached data remains valid), cache warming strategies, and stale-while-revalidate patterns.

Write-Heavy Workload Considerations

Systems with high write-to-read ratios see diminishing returns from read replicas. If writes dominate, the primary remains the bottleneck. Replicas help read scaling but do nothing for write throughput. Worse, replication load adds overhead to the already-stressed primary.

Write-heavy workloads often benefit more from sharding (partitioning data across multiple primaries) or async write patterns (queuing writes for batch processing) than from read replicas. Analyze your actual read/write ratio before assuming replicas are the solution.

Complexity Cost Assessment

Every replica adds operational burden: monitoring lag, handling failover, debugging consistency issues, capacity planning. A single-primary setup with good caching may serve millions of users without replica complexity. Add replicas when you have clear evidence of read bottlenecks that caching cannot solve.

Signs you need replicas: primary CPU saturated by read queries, read latency increasing despite optimization, geographic latency requirements for global users, need for read scaling independent of write capacity. Without these signals, complexity may outweigh benefits.

Alternative Architectures

CQRS (Command Query Responsibility Segregation) separates read and write models entirely. Writes go to a normalized primary optimized for consistency. Async processes transform data into denormalized read-optimized stores—separate databases, search indexes, or materialized views tailored to specific query patterns.

This decouples read and write scaling completely. Read stores can use entirely different technology: search engines for full-text queries, graph databases for relationship traversal, columnar stores for analytics. The tradeoff is increased system complexity and eventual consistency between write and read stores.

💡 Key Takeaways
When 90 to 95 percent of reads hit CDN or application caches, adding read replicas reduces overall load by only 5 to 10 percent, not justifying operational complexity and cost of replication management
Strict read after write or linearizable consistency requirements force all reads to primary, eliminating replica scaling benefits. Consider synchronous replication, consensus stores (Raft, Paxos), or quorum reads instead
Write saturated primaries (CPU over 80 percent from writes) do not benefit from read replicas and may worsen due to 10 to 30 percent replication overhead. Solutions include sharding, write optimization, or specialized stores
Each replica costs as much as primary in compute and storage. Five replicas equal 6 times single instance cost. Cross region replicas writing 100 GB per day incur approximately $800 per month in data egress alone at $0.09 per GB
Alternative scaling strategies include cache optimization (longer TTL, smarter invalidation), denormalization (duplicate data to avoid joins), sharding (partition across primaries), or purpose built databases (time series, log stores)
📌 Interview Tips
1Cache heavy workload: E-commerce site with 100,000 reads per second, 95% cache hit rate. Database sees 5,000 reads per second. Adding 4 replicas reduces database load to 1,250 RPS per instance, but overall system load drops only 4%, minimal impact for added complexity.
2Write bottleneck: Social media service writes 60,000 posts per second to primary at 85% CPU. Read replicas help read latency but writes still bottlenecked. Solution: shard posts by user ID across 10 primaries, each handling 6,000 writes per second at comfortable 40% CPU.
3Cost analysis: Database writing 200 GB/day replicated to 3 cross region replicas. Transfer cost: 200 GB × 3 regions × $0.09 = $54/day = $1,620/month, plus $3,000/month in replica compute. Alternative: Invest $1,000/month in larger cache cluster, cut database load 80%, eliminate replicas.
← Back to Read Replicas & Query Routing Overview