Layer 4 vs Layer 7 Load Balancing Algorithm Trade offs
Layer 4 Algorithm Behavior
Load balancing algorithms behave fundamentally differently at Layer 4 (L4, transport layer) versus Layer 7 (L7, application layer) due to their visibility into traffic. L4 balancers operate on TCP or UDP flows, seeing only IP addresses, ports, and protocol. They typically hash the 5-tuple (source IP, destination IP, source port, destination port, protocol) to maintain per-flow stickiness, routing all packets of a TCP connection to the same backend. This enables stateful protocols and keeps overhead minimal, adding only single-digit microseconds of latency and achieving 10-40 Gbps per node with millions of packets/second.
Layer 7 Algorithm Capabilities
L7 balancers terminate connections and inspect HTTP requests, enabling per-request routing decisions. They see URLs, headers, cookies, and request methods, allowing content-based routing (route API calls to one pool, static assets to another) and sophisticated algorithms. Critically, L7 balancers can track concurrent HTTP/2 streams rather than bare TCP connections. A single HTTP/2 connection might multiplex 100 concurrent requests. Least connections at L4 sees one connection and considers that backend lightly loaded, while L7 least requests correctly sees 100 in-flight requests.
Performance Trade-offs
The performance trade-off is substantial. L4 balancers forward packets with minimal processing: 10-40 Gbps throughput per node with sub-millisecond added latency. L7 requires full HTTP parsing, TLS termination (decrypting incoming traffic and re-encrypting to backends), and connection pooling, adding 5-20ms latency and limiting throughput to tens of thousands RPS per node. However, L7 enables critical features: session affinity via cookies (robust across NAT), request-level metrics for better routing decisions, and retry/timeout logic per request.
Hierarchical Production Patterns
Production systems often use both in hierarchy. A typical pattern: L7 anycast routing to nearest PoP (Point of Presence), then L7 health and latency-based backend selection within region, while the final hop within a datacenter uses L4 flow hashing for minimal overhead. The L7 layer handles cross-region intelligence and content-based routing; the L4 layer handles high-throughput intra-datacenter distribution. This combines the intelligence of L7 with the efficiency of L4.
L4 Failure Mode: Long-Lived Connections
The failure mode to watch at L4 is connection imbalance with long-lived connections. WebSockets or HTTP/2 connections lasting hours cause skew because flow hashing pins each to a single backend. Organic churn does not rebalance existing connections. If you scale from 10 to 20 backends, existing connections stay on the original 10 while only new connections distribute across 20. Mitigation requires connection draining on scale events (gracefully closing existing connections over 60-300 seconds) or periodic client-side reconnection (every 30-60 minutes).