Layer 4 vs Layer 7 Load Balancing Algorithm Trade offs

Layer 4 Algorithm Behavior
Load balancing algorithms behave fundamentally differently at Layer 4 (L4, transport layer) versus Layer 7 (L7, application layer) due to their visibility into traffic. L4 balancers operate on TCP or UDP flows, seeing only IP addresses, ports, and protocol. They typically hash the 5-tuple (source IP, destination IP, source port, destination port, protocol) to maintain per-flow stickiness, routing all packets of a TCP connection to the same backend. This enables stateful protocols and keeps overhead minimal, adding only single-digit microseconds of latency and achieving 10-40 Gbps per node with millions of packets/second.
Layer 7 Algorithm Capabilities
L7 balancers terminate connections and inspect HTTP requests, enabling per-request routing decisions. They see URLs, headers, cookies, and request methods, allowing content-based routing (route API calls to one pool, static assets to another) and sophisticated algorithms. Critically, L7 balancers can track concurrent HTTP/2 streams rather than bare TCP connections. A single HTTP/2 connection might multiplex 100 concurrent requests. Least connections at L4 sees one connection and considers that backend lightly loaded, while L7 least requests correctly sees 100 in-flight requests.
Performance Trade-offs
The performance trade-off is substantial. L4 balancers forward packets with minimal processing: 10-40 Gbps throughput per node with sub-millisecond added latency. L7 requires full HTTP parsing, TLS termination (decrypting incoming traffic and re-encrypting to backends), and connection pooling, adding 5-20ms latency and limiting throughput to tens of thousands RPS per node. However, L7 enables critical features: session affinity via cookies (robust across NAT), request-level metrics for better routing decisions, and retry/timeout logic per request.
Hierarchical Production Patterns
Production systems often use both in hierarchy. A typical pattern: L7 anycast routing to nearest PoP (Point of Presence), then L7 health and latency-based backend selection within region, while the final hop within a datacenter uses L4 flow hashing for minimal overhead. The L7 layer handles cross-region intelligence and content-based routing; the L4 layer handles high-throughput intra-datacenter distribution. This combines the intelligence of L7 with the efficiency of L4.
L4 Failure Mode: Long-Lived Connections
The failure mode to watch at L4 is connection imbalance with long-lived connections. WebSockets or HTTP/2 connections lasting hours cause skew because flow hashing pins each to a single backend. Organic churn does not rebalance existing connections. If you scale from 10 to 20 backends, existing connections stay on the original 10 while only new connections distribute across 20. Mitigation requires connection draining on scale events (gracefully closing existing connections over 60-300 seconds) or periodic client-side reconnection (every 30-60 minutes).
Key Trade-off: L4 offers 10-40 Gbps throughput with microsecond latency but sees only flows, not requests. L7 adds 5-20ms latency but enables per-request routing, cookie affinity, and correct HTTP/2 stream counting. Use L4 for raw throughput, L7 for intelligent routing, often in combination.

💡 Key Takeaways

✓L4 hashes 5-tuple for per-flow stickiness with microsecond latency and 10-40 Gbps throughput; millions of packets/second per node

✓L7 sees full HTTP enabling per-request routing, cookie affinity; correctly counts HTTP/2 streams (1 TCP connection = 100 requests)

✓Performance gap: L7 requires TLS termination and HTTP parsing, adding 5-20ms latency, limiting to tens of thousands RPS per node

✓Long-lived connection imbalance: WebSocket/HTTP/2 connections pin via flow hash; scaling adds backends but existing connections stay

📌 Interview Tips

1Explain HTTP/2 counting problem: L4 sees 1 TCP connection (light load); L7 sees 100 concurrent streams (heavy load)

2Describe hierarchical pattern: L7 for cross-region routing, L4 for intra-datacenter high-throughput distribution

3Walk through connection imbalance: scale from 10 to 20 backends, existing WebSocket connections stay on original 10

← Back to Load Balancing Algorithms Overview