Load BalancingGlobal Load BalancingEasy⏱️ ~2 min

What is Global Load Balancing?

Global Load Balancing (GLB) distributes user traffic across multiple geographic regions or data centers to minimize latency, maximize availability, and efficiently use capacity. Unlike local load balancers that distribute requests within a single data center, GLB operates at a planetary scale and makes routing decisions based on where users are located and which regions are healthy. The architecture typically involves three layers. At the top sits a global traffic router using either Domain Name System (DNS) based Global Server Load Balancing (GSLB) or Anycast Layer 4/Layer 7 edge proxies. Below that, regional load balancers distribute traffic within each geographic region. At the bottom, application backends serve the actual requests. The routing engine considers multiple signals: user geography, measured network latency, real time regional health and capacity, data residency requirements, and operational costs. Consider how Google Search works when you type google.com. The DNS resolver or Anycast routing sends your request to the nearest Google edge Point of Presence (PoP), which then forwards it to the optimal backend region. This happens in milliseconds and handles millions of queries per second globally. The alternative would be routing everyone to a single data center in California, adding 200 to 300 milliseconds of latency for users in Asia and creating a single point of failure. The hard constraint GLB must respect is physics. Cross continental Round-Trip Times (RTTs) create a floor for latency: US East to US West takes 60 to 80 milliseconds, US East to Europe takes 70 to 100 milliseconds, and US to India takes 200 to 300 milliseconds. No amount of clever routing can overcome these limits for synchronous operations.
💡 Key Takeaways
Global Load Balancing routes traffic across multiple geographic regions, operating above local load balancers that work within a single data center
Three layer architecture: global traffic router at top (DNS or Anycast), regional load balancers in the middle, application backends at bottom
Routing decisions consider user geography, network latency measurements, regional health and capacity metrics, compliance requirements, and cost optimization
Cross continental RTTs impose hard physical limits: US East to West is 60 to 80 ms, US to Europe is 70 to 100 ms, US to Asia is 200 to 300 ms
Major providers like Google handle millions of queries per second globally through hundreds of edge Points of Presence (PoPs) with sub second failover capabilities
📌 Examples
Google Search uses Anycast to route users to the nearest edge PoP from hundreds globally, then proxies to the optimal backend region while handling millions of QPS
When a user in Mumbai accesses a service, GLB routes to the India region (15 ms latency) rather than US West (250+ ms latency), improving response time by over 90%
Netflix operates active active across three AWS regions and can drain an entire region within minutes during Chaos Kong drills while maintaining availability
← Back to Global Load Balancing Overview
What is Global Load Balancing? | Global Load Balancing - System Overflow