Database Design • Read Replicas & Query RoutingEasy⏱️ ~2 min
What Are Read Replicas and Why Do They Matter?
Read replicas are read only copies of your primary database that stay synchronized through replication, typically asynchronously. When a write commits on the primary, the change is streamed to replicas with some delay. This delay, called replication lag, is fundamental to understanding how replicas work. Within a single region, lag is typically 10 to 100 milliseconds in steady state. Cross region replicas see 100 to 1000 milliseconds of lag due to network Round Trip Time (RTT) and batching.
The core value proposition is horizontal read scaling. Social networks, e-commerce sites, and content platforms commonly see read to write ratios of 5:1 to 20:1. A single database handling 10,000 queries per second (QPS) of reads plus 2,000 QPS of writes can offload those reads across 5 replicas, reducing per instance load from 12,000 to approximately 4,000 QPS (2,000 writes on primary, 2,000 reads distributed across replicas). This reduces tail latency under load and prevents the primary from becoming a bottleneck.
Amazon Aurora exemplifies production deployment patterns. Aurora supports up to 15 read replicas within a region, all sharing a distributed storage layer. This architecture keeps intra region lag to single or double digit milliseconds. Aurora exposes a writer endpoint for the primary and a reader endpoint that automatically load balances across all healthy replicas. For cross region deployments, Aurora Global Database typically maintains under 1 second of lag in steady state.
The tradeoff is consistency versus scalability. Asynchronous replication means replicas serve slightly stale data. A user who posts a comment and immediately refreshes may not see their own content if the read hits a lagging replica. This read after write anomaly requires explicit routing strategies, which we will cover in depth on subsequent cards.
💡 Key Takeaways
•Replication lag in same region deployments averages 10 to 100 milliseconds, while cross region lag ranges from 100 to 1000 milliseconds due to network distance and batching
•Production systems commonly deploy 3 to 10 replicas per shard to handle read to write ratios of 5:1 to 20:1 in social, content, and commerce workloads
•Amazon Aurora supports up to 15 read replicas with single to double digit millisecond lag within a region by sharing a distributed storage layer across all instances
•Each replica adds 10 to 30 percent overhead on the primary database due to replication fanout, log generation, and network bandwidth consumption
•Cross region replicas incur data egress costs on every write (typically $0.09 per GB transferred) plus full instance compute costs, often totaling 1 to 3 times the primary cost
📌 Examples
A social media feed service handles 50,000 reads per second and 5,000 writes per second. With 5 read replicas, each replica serves approximately 10,000 read QPS while the primary handles 5,000 writes plus some reads that require strict consistency.
Amazon RDS supports up to 15 read replicas per primary instance, distributable across availability zones or regions. Multi Availability Zone (AZ) deployments use synchronous replication for high availability, not read scaling.
Aurora Global Database maintains typical cross region replication lag under 1 second in steady state, enabling geo local reads with bounded staleness for international user bases.