Replication & ConsistencyReplication Lag & SolutionsEasy⏱️ ~2 min

What is Replication Lag and How is it Measured?

Replication lag is the elapsed time or log distance between a write being committed on a leader and that same change becoming visible on a follower replica. In leader based systems, the leader ships an append only change log (such as a Write Ahead Log or WAL) to followers over the network. Followers must receive the bytes, persist them to disk, and then apply or replay them to their local data structures. Lag can accumulate at any point in this pipeline: network transit time, enqueue and dequeue delays, disk flush operations, and apply throughput limitations. There are two universal measurement approaches. Time based lag measures how long it has been since the follower applied the leader's latest commit timestamp. This metric is intuitive for humans but can be skewed by clock drift between servers. Position based lag measures the difference in log sequence numbers or byte offsets between what the leader has committed versus what the follower has applied. For example, if the leader is at log position 1,000,000 and the follower has only applied up to position 995,000, the lag is 5,000 positions. Position based metrics are robust against clock skew and are the preferred choice for gating read requests or making failover decisions in production systems. In practice, you should track both metrics together. Time based lag helps operators understand user impact (a 30 second lag is more meaningful than "5000 log entries behind"), while position based lag provides the precision needed for automated systems to make consistency guarantees. Systems like those at Meta and Netflix instrument every stage of the replication pipeline with throughput metrics (operations per second, megabytes per second) and latency distributions to pinpoint where lag accumulates.
💡 Key Takeaways
Position based lag (log sequence number differences) is preferred for automated gating and failover decisions because it is immune to clock drift between servers
Time based lag measures seconds since last applied commit and is intuitive for operators but can be misleading if leader and follower clocks are not synchronized
Replication pipeline has four measurable stages: enqueue at leader, network transit, persist to disk on follower, and apply to data structures on follower
Production systems instrument each pipeline stage separately to identify bottlenecks: a 10 second lag might be 0.1s network, 0.5s enqueue, 2s persist, and 7.4s apply time
Typical cross Availability Zone (AZ) replication lag in healthy systems is under 1 to 2 seconds but can jump to tens of seconds or minutes during write bursts or network issues
A 100 GB backlog over 5 Gbps effective throughput takes approximately 27 minutes to drain; at 1 Gbps it takes 2 hours and 13 minutes to catch up
📌 Examples
Meta's TAO social graph store uses position based replication tracking for session level read your writes guarantees, routing user reads to sources that have applied the user's latest writes
Box captures the leader's log offset after significant writes and attaches it to subsequent read requests; the read router picks the first replica whose applied position is greater than or equal to the required position
Netflix multi region stores use local quorums within a region (low lag, fast commits) while streaming asynchronously to remote regions (higher lag but maintained availability during region isolation)
← Back to Replication Lag & Solutions Overview