Database DesignRead Replicas & Query RoutingMedium⏱️ ~2 min

Solving Read After Write Consistency with Routing Policies

Read after write consistency guarantees that if a user writes data and immediately reads it back, they will see their own write. This is surprisingly difficult with asynchronous read replicas. Consider a user posting a comment. The write commits to primary at timestamp T0. At T0 plus 50 milliseconds, the user refreshes. With 10 to 100 millisecond replication lag, there is a significant probability the replica has not yet applied the change. The user sees no comment and files a bug report. Session pinning is the simplest mitigation. After any write, pin that user session or connection to the primary database for a time window, typically 200 to 500 milliseconds (covering p99 replica lag). All subsequent reads from that session go to primary during the window, guaranteeing they see the write. After the window expires, reads can safely route to replicas since replication has caught up. The tradeoff is you push read load back to the primary, reducing the scaling benefit of replicas. In workloads where users frequently write then immediately read (think collaborative editing, messaging, or social posting), 30 to 50 percent of reads may hit primary despite having replicas. Freshness tokens offer a more sophisticated approach. When a write commits, the primary returns a token representing its replication position: a Log Sequence Number (LSN), Global Transaction Identifier (GTID), or similar monotonic marker. The application passes this token with subsequent read requests. The router checks each replica's applied position. If a replica has reached or passed the token, it is safe to route there. Otherwise, the router tries another replica or falls back to primary. This minimizes primary load by routing to replicas as soon as they are fresh enough, but adds complexity. You must propagate tokens through your entire request path, handle token expiration, and maintain per replica position tracking. Production systems often combine strategies. Amazon internally uses variants of freshness tokens in services like DynamoDB (which exposes consistent read options) and S3 (which documents read after write consistency guarantees). The key is making consistency explicit: expose knobs so critical paths (user facing post write reads) can pay the latency and load cost of strong consistency, while background jobs and analytics can use eventually consistent replica reads to maximize throughput and cost efficiency.
💡 Key Takeaways
Session pinning to primary for 200 to 500 milliseconds after writes guarantees read after write consistency but can push 30 to 50 percent of reads back to primary in write heavy user workflows
Freshness tokens (LSN or GTID markers) let routers verify replica freshness per request, minimizing primary load by routing to replicas as soon as they catch up to required positions
Production systems at Amazon use freshness token variants in DynamoDB (consistent read API option costs double the read capacity units) and S3 (guarantees read after write for new object PUTs)
Explicit consistency knobs let critical user facing paths pay for strong consistency (higher latency, primary load) while background jobs use eventually consistent replicas (lower latency, better throughput)
Token propagation requires passing replication positions through entire request chains including caches, queues, and service boundaries, significantly increasing implementation complexity
📌 Examples
Session pinning: After a user submits a form, your web server sets a sticky cookie with timestamp. For the next 300ms, all read queries from that cookie route to primary. After 300ms, normal replica routing resumes.
Freshness token: User uploads a profile photo. Primary returns LSN 12500 with the success response. Client includes LSN 12500 in the next GET request. Router checks replicas: Replica1 at LSN 12480 (skip), Replica2 at LSN 12550 (use this one).
Tiered consistency: Your timeline API accepts a consistency_level parameter. 'strong' routes to primary (5ms p99 latency), 'bounded_staleness' requires replicas within 100ms lag (3ms p99), 'eventual' uses any healthy replica (2ms p99).
← Back to Read Replicas & Query Routing Overview
Solving Read After Write Consistency with Routing Policies | Read Replicas & Query Routing - System Overflow