What is Layer 4 (L4) Load Balancing?

Definition
Layer 4 (L4) load balancing operates at the transport layer (handling TCP/UDP connections), making forwarding decisions based solely on network information: the 5-tuple consisting of source IP, destination IP, source port, destination port, and protocol (TCP or UDP). Because it never inspects application data like HTTP headers or request bodies, L4 achieves minimal latency of tens to hundreds of microseconds per packet.
Why Operating Modes Matter
When an L4 load balancer receives a packet, it must answer a fundamental question: how should it get this packet to the backend server? The answer defines the load balancer's operating mode. This choice cascades through every aspect of the system: memory consumption, maximum throughput, failure handling, and network topology requirements. Understanding these modes is essential because the wrong choice can leave gigabits of capacity unused or create bottlenecks that collapse under load.
Full Proxy Mode (NAT)
In full proxy mode, the load balancer terminates the client connection entirely and opens a separate connection to the backend. It maintains a NAT (Network Address Translation) table tracking every active flow, consuming 200-500 bytes of state per TCP connection. A load balancer with 2GB memory can track millions of simultaneous flows, but when this table fills, new connections are dropped. The critical architectural constraint: all traffic, both inbound requests and outbound responses, flows through the load balancer. This means your load balancer's egress bandwidth caps your total system throughput.
Direct Server Return Mode
DSR (Direct Server Return) solves the egress bottleneck by changing the return path: the load balancer handles only inbound packets while backend servers reply directly to clients, bypassing the load balancer entirely. This architectural shift dramatically increases throughput. Modern kernel-bypass implementations using technologies like DPDK or XDP (which process packets directly in the kernel or network card, avoiding the overhead of the full network stack) achieve 10-40 Gbps per load balancer because they never process response traffic. The trade-off is complexity: servers must be configured with the VIP (Virtual IP, the public address clients connect to) on a special non-routed interface so they accept packets destined for that IP but don't advertise it to the network.
Performance Characteristics
Production L4 load balancers handle millions of new connections per second with sub-millisecond added latency. They excel for non-HTTP protocols (gaming servers, real-time video, DNS), scenarios requiring absolute minimum latency, or extreme packets-per-second workloads where application layer inspection would waste CPU cycles. The limitation: L4 cannot route based on URL paths, HTTP headers, or cookies because it never reads application data. Health checks are limited to TCP handshake success or TLS negotiation, which only confirms the port is open, not that the application is functioning correctly.
Key Trade-off: Full proxy mode gives you complete visibility and control over all traffic but caps throughput at your load balancer's bandwidth. DSR mode removes the throughput ceiling but sacrifices response visibility and requires special server configuration. Choose based on whether you need traffic inspection or raw performance.

💡 Key Takeaways

✓L4 operates on 5-tuple (src/dst IP, src/dst port, protocol); achieves microsecond latency by skipping application data inspection entirely

✓Operating mode is the fundamental choice: determines memory usage, throughput limits, failure modes, and network topology requirements

✓Full proxy (NAT) mode: tracks all flows (200-500 bytes each), processes all traffic both directions, throughput limited by load balancer egress

✓DSR mode: load balancer handles inbound only, servers reply directly to clients, achieving 10-40 Gbps but requiring special VIP configuration on servers

📌 Interview Tips

1When designing for non-HTTP protocols (gaming, video streaming, DNS), mention L4 is the natural choice since L7 application parsing adds latency with no benefit

2If asked about scaling L4 load balancers, explain DSR removes egress as bottleneck but requires VIP configuration on all backends

3Common follow-up: what if you need HTTP routing AND low latency? Answer: L4 fronting L7 - L4 handles connection distribution at wire speed, L7 pools handle application routing

← Back to L4 vs L7 Load Balancing Overview