Replication Lag, Read Consistency Patterns, and Follower Read Trade-offs

Replication lag is the time delay between when the leader commits a write and when a follower applies it, typically measured in milliseconds on a healthy Local Area Network (LAN) but can stretch to seconds during high load, network congestion, or slow follower apply rates. This lag creates consistency anomalies for clients reading from followers. The most common anomaly is a client performing a write to the leader, then immediately reading from a follower that has not yet applied the write, violating the read your writes consistency guarantee users expect. Another anomaly is non monotonic reads: a client reads a value from one follower, then a subsequent read from a different follower returns an older version, appearing as if data went backwards in time. These violations are particularly problematic for user facing applications where seeing stale or inconsistent data breaks the user experience.

To maintain read your writes consistency, systems can use session stickiness to pin a user session to the leader for both reads and writes, eliminating staleness but losing the read scalability benefit of followers. Alternatively, the client can include the LSN of its last write in subsequent read requests, and followers gate reads until they have applied up to that LSN. This requires the follower to either block the read (adding latency), reject the read with a retry signal, or forward the read to the leader. Google Spanner provides external consistency by using TrueTime to assign globally unique timestamps and waiting out uncertainty intervals, ensuring reads always observe causally prior writes. MongoDB provides causal consistency within a session by tracking operation time and ensuring subsequent reads from any replica wait until that time.

Serving reads from followers is essential for horizontal read scaling, especially in geographically distributed systems where placing followers near users reduces latency from hundreds of milliseconds to tens of milliseconds. Elasticsearch clusters commonly serve read traffic at 10 to 100 times the rate of write traffic by distributing reads across all replicas. However, this scaling comes with the cost of complexity in managing consistency guarantees. The trade off depends on workload: read heavy workloads with relaxed consistency needs (analytics dashboards, caches, content delivery) benefit enormously from follower reads. Write heavy workloads or those requiring strong consistency (financial ledgers, inventory management) may need to read from leader only, sacrificing read scalability. Exposing replication lag as a metric to clients enables them to make informed decisions about whether to read from a follower or wait for stronger consistency.

💡 Key Takeaways

✓Replication lag typically ranges from sub millisecond to a few milliseconds on healthy LAN, but degrades to seconds during high write load, GC pauses, or network issues. Elasticsearch replica lag is usually single digit milliseconds but increases under heavy indexing.

✓Read your writes anomaly: client writes to leader, reads from follower that has not applied write yet, fails to see own update. Solution is session pinning to leader, or LSN gating where follower waits until applying up to client's last write LSN.

✓Monotonic reads anomaly: client reads version N from follower A, then reads version N minus 1 from follower B, seeing data appear to go backwards. Solution is session affinity to single follower, or tracking client's observed LSN and rejecting stale reads.

✓Follower reads enable horizontal read scaling critical for read heavy workloads. Elasticsearch serves 10 to 100 times more read than write traffic by distributing across replicas. Geographic followers reduce latency by 50 to 100 ms by serving local reads.

✓Timeline consistency or bounded staleness allows applications to specify maximum acceptable lag. Client requests can include staleness budget (for example read data no more than 5 seconds old), and system routes to follower meeting that constraint.

✓Trade off is consistency versus read scalability and latency. Strong consistency requires leader reads or quorum reads (R plus W greater than N), sacrificing scale. Systems like Spanner use quorum reads with TrueTime for strong consistency at scale, accepting higher read latency (5 to 10 ms baseline).

📌 Interview Tips

1MongoDB causal consistency: Client session tracks operationTime of last operation. Subsequent reads within that session include afterClusterTime parameter set to operationTime. Replica waits until its oplog reaches that time before serving read, ensuring client never sees data older than its own writes. Adds median 1 to 5 ms latency if replica is slightly behind.

2PostgreSQL hot standby lag monitoring: SELECT pg_last_wal_receive_lsn() minus pg_last_wal_replay_lsn() AS byte_lag on standby shows bytes behind primary. Convert to time lag via replay rate. Application can query this before serving read from standby and redirect to primary if lag exceeds threshold (for example 1 second). Typical production standbys maintain under 100 ms lag under normal load.

3Kafka consumer offset tracking: Consumer tracks high watermark (latest committed offset) and log end offset per partition. Lag equals log end offset minus consumer offset. Applications reading from Kafka for near real time processing monitor consumer lag and alert if exceeds Service Level Agreement (SLA), for example 5 seconds behind. Follower brokers similarly track leader log end offset to monitor replication lag.

← Back to Leader-Follower Replication Overview