Load BalancingL4 vs L7 Load BalancingEasy⏱️ ~3 min

What is Layer 4 (L4) Load Balancing?

Layer 4 load balancing operates at the transport layer of the Open Systems Interconnection (OSI) model, making forwarding decisions based solely on network information: the 5 tuple consisting of source Internet Protocol (IP), destination IP, source port, destination port, and protocol. It does not inspect application data, which means it can forward packets with minimal processing overhead, typically adding only tens to hundreds of microseconds of latency on modern hardware with kernel bypass. There are two primary modes. Full proxy Transmission Control Protocol/User Datagram Protocol (TCP/UDP) mode terminates the client connection and opens a separate backend connection, maintaining a Network Address Translation (NAT) table to track state. Direct Server Return (DSR) mode only handles inbound packets while servers reply directly to clients, removing egress load from the load balancer entirely. A third pattern uses anycast Virtual IP (VIP) addresses with consistent hashing, where multiple L4 load balancers advertise the same IP globally and routers deliver traffic to the nearest one. Google Maglev exemplifies production L4 load balancing at scale. It sits behind anycast VIPs and sustains multi 10 Gigabits per second (Gbps) per server on commodity hardware with sub millisecond datapath overhead, handling millions of Queries Per Second (QPS) per VIP globally. Meta's Katran, built on Express Data Path/extended Berkeley Packet Filter (XDP/eBPF), achieves 10 to 40 Gbps per server with significantly lower Central Processing Unit (CPU) utilization than traditional iptables based NAT. Amazon Web Services (AWS) Network Load Balancer (NLB) scales to millions of requests per second with data plane latency on the order of tens to hundreds of microseconds. The key tradeoff is performance versus intelligence. L4 provides maximum throughput and minimal latency but lacks content aware routing and application layer protections. It excels for non HyperText Transfer Protocol (HTTP) protocols like gaming, real time media, Domain Name System (DNS), and scenarios requiring ultra low latency or extreme packets per second throughput.
💡 Key Takeaways
Operates at transport layer using 5 tuple (source/destination IP, source/destination port, protocol) without inspecting application payload
Full proxy mode adds tens to hundreds of microseconds latency but provides precise control; DSR mode removes egress load from load balancer entirely
Production throughput: Google Maglev sustains multi 10 Gbps per server, Meta Katran achieves 10 to 40 Gbps per server with low CPU utilization
Consistent hashing (Maglev style lookup tables) preserves flow stickiness during scale events, minimizing connection resets to under 5 to 10 percent remaps
Best for non HTTP protocols (gaming, real time media, DNS) or when ultra low latency and maximum packets per second throughput are critical
Limited observability: can only track flow counters, synchronize (SYN)/acknowledge (ACK) rates, and connection health, lacks request level visibility
📌 Examples
AWS Network Load Balancer scales to millions of requests per second with tens to hundreds of microseconds data plane latency, preserving client source IP
Meta Katran (XDP/eBPF) powers Facebook and Instagram frontends, running in front of L7 proxies with 10 to 40 Gbps per server throughput
Google Maglev uses anycast VIPs with consistent hashing to handle millions of QPS per VIP globally with failover convergence in seconds
← Back to L4 vs L7 Load Balancing Overview
What is Layer 4 (L4) Load Balancing? | L4 vs L7 Load Balancing - System Overflow