What Are Read Replicas and Why Do They Matter?
What Are Read Replicas
Read replicas are read-only copies of your primary database that stay synchronized through replication. When a write commits on the primary, the change streams to replicas with some delay called replication lag. This creates multiple copies of your data that can serve read queries, distributing load across many servers instead of concentrating it on one.
The fundamental tradeoff: replicas increase read capacity but introduce complexity around consistency. Data on replicas is always slightly behind the primary. Applications must understand and handle this staleness, or users encounter confusing behavior where data appears and disappears.
Replication Mechanisms
Asynchronous replication streams committed transactions from primary to replicas without waiting for acknowledgment. The primary commits immediately, replicas apply changes as fast as they can. This minimizes write latency but means replicas lag behind by milliseconds to seconds depending on write volume and network conditions.
Synchronous replication requires at least one replica to acknowledge before the primary commits. This guarantees the replica has the data but adds network round-trip latency to every write. Semi-synchronous modes wait for acknowledgment but proceed after timeout, balancing durability with availability.
Replication Lag Characteristics
Replication lag is measured as the time difference between when a transaction commits on the primary and when the replica applies it. Typical lag ranges from 10-100ms under normal conditions but can spike to seconds during write bursts, long transactions, or network issues.
Lag accumulates when replicas cannot apply changes as fast as the primary produces them. Large transactions, schema changes, and bulk operations are common causes. Monitoring lag is critical: a replica 5 seconds behind serves data that may confuse users expecting recent updates.
Read Scaling Benefits
Adding replicas linearly increases read throughput. If your primary handles 10,000 reads per second and you add 3 replicas, total capacity becomes 40,000 reads per second. This scales horizontally without changing application code significantly, just routing decisions.
Geographic distribution places replicas closer to users in different regions, reducing read latency. A user in Europe reads from a European replica instead of crossing the Atlantic to a US primary. Write latency still requires reaching the primary, but reads—typically the majority of traffic—become local.