Load BalancingGlobal Load BalancingHard⏱️ ~3 min

State Management and Data Gravity in Global Load Balancing

Global Load Balancing only delivers value when your application architecture can tolerate users moving between regions. Stateless services and globally replicated caches make region switching trivial: every region can serve any request independently. The challenge emerges with stateful writes, which require careful placement strategies to maintain consistency while minimizing cross region latency penalties. Three patterns dominate production systems. First, single writer per shard or tenant routes all writes for a given partition to a designated home region, with asynchronous replication to other regions for read scaling. This avoids distributed consensus overhead but creates affinity: users must reach their home region for writes, adding potential cross continental RTT. Second, multi writer with conflict resolution using Conflict-free Replicated Data Types (CRDTs) or version vectors allows writes anywhere but pushes complexity into application logic and limits the data models that work cleanly. Third, read local write home workflows keep fast read paths while routing writes through a longer path to the authoritative region. Consider Netflix's architecture. They operate active active across three AWS regions with aggressive content caching at Content Delivery Network (CDN) edges. The critical insight is that video streaming is read dominant: serving video segments from cache requires no cross region coordination. User metadata writes (playback position, preferences) are less frequent and tolerate slightly higher latency, routed to home regions with asynchronous replication. During Chaos Kong regional evacuation drills, they maintain availability by keeping 30% spare capacity per region to absorb failover load and using client logic that retries against alternate regions. The economics matter at scale. Synchronous cross region calls on critical paths multiply latency: a service that makes 5 cross region Remote Procedure Calls (RPCs) at 80 milliseconds each adds 400 milliseconds to user latency, destroying any GLB benefit. Egress costs compound this: at $0.02 to $0.09 per GB for inter region transfer, a service doing 10 Gbps cross region costs $50,000 to $225,000 monthly just in bandwidth. Design principle: pin chatty microservices within regions and use asynchronous replication for global state, accepting eventual consistency where business logic permits.
💡 Key Takeaways
Stateless services and globally replicated caches enable trivial region switching; stateful writes require home region routing or conflict resolution strategies
Single writer per shard avoids consensus but creates data gravity: users must reach home region for writes, potentially adding 60 to 300 ms cross continental RTT
Multi writer with CRDTs or version vectors allows writes anywhere but limits applicable data models and pushes conflict resolution complexity into application code
Netflix maintains 30% spare capacity per region to absorb failover during Chaos Kong drills, proving they can evacuate a region within minutes while staying within error budgets
Cross region egress at $0.02 to $0.09 per GB means 10 Gbps sustained cross region traffic costs $50,000 to $225,000 monthly, making locality critical for cost efficiency
Five synchronous cross region RPCs at 80 ms each add 400 ms to request latency, completely negating any GLB latency benefit from smart routing
📌 Examples
A social media feed service keeps write operations pinned to user home regions (determined by signup location) and replicates read caches globally with 100 to 500 ms replication lag
Google Spanner provides global strong consistency but at the cost of cross continental RTT for writes; read only transactions can use stale reads from local replicas with bounded staleness
An ecommerce checkout flow routes shopping cart reads to local region but forces payment writes to the user's home region for regulatory compliance and fraud detection
← Back to Global Load Balancing Overview