Lag Aware Load Balancing and Health Based Routing
Why Naive Load Balancing Fails
Naive round-robin load balancing across replicas fails in production because replicas are not uniformly healthy. Replication lag varies due to network jitter, write burst absorption, long transaction apply delays, and hardware differences. A replica lagging 5 seconds behind serves stale data and violates user expectations.
A replica at 95% CPU saturation adds hundreds of milliseconds of queuing delay. Round-robin sends equal traffic to the struggling replica, making latency worse for all requests routed there. Intelligent load balancing must consider both freshness and capacity.
Lag-Aware Load Balancing
Lag-aware load balancing continuously measures replica freshness and saturation, routing reads to healthy replicas only. Each replica exposes its current lag (seconds behind primary) and load metrics. The router maintains this state, updated every few seconds or via streaming health checks.
Routing decisions then incorporate lag thresholds. If your application tolerates 1 second staleness, exclude replicas with lag greater than 1 second from the pool. Among eligible replicas, use weighted round-robin based on current load—send more traffic to less loaded replicas.
Adaptive Routing Algorithms
Least-connections routing sends new requests to the replica with fewest active queries. This naturally adapts to varying query costs: a replica executing slow queries accumulates connections and receives less traffic. Fast replicas drain connections quickly and receive more traffic.
Latency-based routing measures actual response times and routes to fastest replicas. This accounts for all factors: replication lag, query processing, and network latency. P99 latency (the response time at the 99th percentile—meaning 99% of requests are faster than this) is often more useful than average for detecting struggling replicas.
Health Check Design
Effective health checks go beyond simple connectivity. A replica can accept connections while being 30 seconds behind on replication. Health checks should query current lag, connection count, CPU utilization, and run a simple test query to verify query execution works.
Health check frequency trades accuracy against overhead. Checking every 100ms gives fresh data but adds load. Checking every 5 seconds misses fast-developing problems. Many systems use tiered checks: lightweight ping every second, detailed metrics every 5 seconds.