Database Design • Read Replicas & Query RoutingMedium⏱️ ~2 min
Query Routing Strategies: Application Level vs Proxy Based
Query routing is the logic that decides where each database operation should execute: the primary for writes and safe reads, or replicas for operations that tolerate staleness. Two architectural patterns dominate production systems, each with distinct tradeoffs.
Application level splitting puts routing logic directly in your service code. The application maintains separate connection pools labeled "write" and "read", choosing between them based on query type and consistency requirements. This approach offers maximum flexibility. You can implement sophisticated policies like session pinning (sending all reads to primary for 200 milliseconds after a write), per request consistency hints, or custom lag aware selection. The downside is complexity proliferation. Every service team must implement and maintain routing logic, handle failover scenarios, and track replica topology changes. At Meta and Amazon scale, this often means shared libraries or frameworks, but smaller teams face substantial engineering overhead.
Middleware or proxy based routing centralizes the logic in a stateful layer between applications and databases. The proxy parses incoming Structured Query Language (SQL) statements, classifies them as reads or writes, tracks transaction state, monitors replica health and lag, and load balances accordingly. Shopify publicly describes using this pattern to handle storefront traffic and flash sales, with policies including transaction pinning to primary and lag aware routing. The proxy simplifies application code dramatically. Services connect to a single endpoint and receive transparent routing. However, you have now introduced an extra network hop (adding 0.5 to 3 milliseconds of latency depending on placement) and a potential single point of failure. High availability requires running multiple proxy instances behind load balancers, and the proxy itself becomes a component to monitor, scale, and secure.
In practice, managed services like Amazon RDS and Aurora offer a hybrid. They expose writer and reader endpoints (proxy like abstraction) but application code still must choose which endpoint to use for each operation (application level decision). This balances simplicity with flexibility, letting teams adopt read replicas without building full proxy infrastructure while retaining control over consistency semantics.
💡 Key Takeaways
•Application level routing offers maximum flexibility for consistency policies but requires every service team to implement and maintain query classification, failover handling, and topology tracking logic
•Proxy based routing adds 0.5 to 3 milliseconds of latency per query due to the extra network hop, and the proxy becomes a single point of failure requiring its own high availability architecture
•Production proxies must parse SQL to classify queries, track transaction boundaries, monitor per replica lag and saturation metrics, and implement circuit breakers to handle replica failures without cascading overload
•Amazon Aurora abstracts routing with separate writer and reader endpoints, where the reader endpoint automatically load balances across up to 15 replicas within a region while applications choose endpoint per query
•Shopify uses database aware proxies to handle flash sale traffic spikes, implementing transaction pinning to primary, lag aware replica selection, and automatic fallback when routing uncertainty is detected
📌 Examples
Application level: Your Python service maintains two connection pools. After a user updates their profile, you set a session flag and route the next 3 read queries to the write pool (primary) to ensure read after write consistency, then revert to the read pool.
Proxy based: ProxySQL or Vitess sits between your app and MySQL. It parses every query, routes SELECT statements to healthy replicas with lag under 100ms, and automatically pins entire transactions to primary if it sees BEGIN or any non SELECT statement.
Hybrid approach: Your service connects to Aurora writer endpoint for all writes and post write reads, and connects to Aurora reader endpoint for timeline and catalog queries that tolerate 50ms staleness. Aurora handles load balancing across replicas internally.