Load Balancing • Load Balancing AlgorithmsHard⏱️ ~3 min
Layer 4 vs Layer 7 Load Balancing Algorithm Trade offs
Load balancing algorithms behave fundamentally differently at Layer 4 (transport layer) versus Layer 7 (application layer) due to their visibility into traffic. Layer 4 balancers operate on TCP or UDP flows, seeing only IP addresses, ports, and protocol. They typically hash the 5 tuple (source IP, destination IP, source port, destination port, protocol) to maintain per flow stickiness, routing all packets of a TCP connection to the same backend. This enables stateful protocols and keeps overhead minimal, adding only single digit microseconds of latency. AWS Network Load Balancer (NLB) operates at Layer 4, handles millions of requests per second, and preserves client source IP addresses.
Layer 7 balancers terminate connections and inspect HTTP requests, enabling per request routing decisions. They see URLs, headers, cookies, and request methods, allowing content based routing (route API calls to one pool, static assets to another) and sophisticated algorithms. Critically, Layer 7 balancers can track concurrent HTTP/2 streams rather than bare TCP connections. A single HTTP/2 connection might multiplex 100 concurrent requests. Least connections at Layer 4 would see one connection and consider that backend lightly loaded, while a Layer 7 least requests algorithm correctly sees 100 in flight requests.
The performance trade off is substantial. Layer 4 balancers forward packets with minimal processing, achieving 10 to 40 Gbps per node with sub millisecond added latency. Google Maglev publishes software Layer 4 load balancing at millions of packets per second per instance. Layer 7 balancing requires full HTTP parsing, Transport Layer Security (TLS) termination, and connection pooling to backends, adding 5 to 20 milliseconds of latency and limiting throughput to tens of thousands of requests per second per node. However, Layer 7 enables critical features: session affinity via cookies (robust across NAT), request level metrics for better least requests decisions, and retry/timeout logic.
Production systems often use both in hierarchy. Azure Front Door does global Layer 7 anycast routing to nearest Point of Presence (POP), then Layer 7 health and latency based backend selection within region, while the final hop within a datacenter uses Layer 4 flow hashing for minimal overhead. The failure mode to watch at Layer 4 is connection imbalance with long lived connections: WebSockets or HTTP/2 connections lasting hours can cause skew as flow hashing pins each to a single backend and organic churn doesn't rebalance. Mitigation requires connection draining on scale events or periodic client side reconnection.
💡 Key Takeaways
•Layer 4 balancers hash 5 tuple for per flow stickiness with sub millisecond latency and 10 to 40 Gbps throughput per node. AWS NLB handles millions of RPS preserving client source IP
•Layer 7 balancers see full HTTP enabling per request routing, cookie affinity, and tracking HTTP/2 concurrent streams (one TCP connection can multiplex 100 requests that Layer 4 sees as single connection)
•Performance gap: Layer 7 requires TLS termination and HTTP parsing, adding 5 to 20ms latency and limiting to tens of thousands RPS per node versus millions of packets per second at Layer 4
•Production hierarchies combine both: Azure Front Door uses Layer 7 anycast to nearest POP with latency based backend selection, then Layer 4 flow hash for final datacenter hop minimizing overhead
•Layer 4 failure mode: Long lived WebSocket or HTTP/2 connections lasting hours cause imbalance as flow hashing pins each to single backend and organic churn doesn't rebalance without forced connection draining
📌 Examples
AWS architecture: NLB (Layer 4) in front of ALB (Layer 7). NLB handles 5 million packets per second with flow hash, forwards to ALB pool. ALB does least outstanding requests routing to EC2 instances, enabling per request decisions while NLB absorbs packet floods
Google Maglev: Layer 4 software load balancer using ECMP and consistent hashing achieves 10 to 40 Gbps per node. Anycast brings traffic to nearest datacenter, Maglev hashes flows to backends with sub second failover on backend failure
HTTP/2 multiplexing issue: 10 clients open 1 connection each to Layer 4 balancer, each connection multiplexes 50 streams. Least connections sees 10 connections (1 per backend), but one backend might have 200 actual concurrent requests while another has 20. Layer 7 least requests correctly tracks 200 vs 20