Production Implementation Patterns for L4/L7 Load Balancers
L4 Implementation: Choosing the Forwarding Mode
Effective L4 implementation starts with choosing between full proxy NAT and DSR (Direct Server Return). Full proxy NAT terminates client connections and opens new ones to backends, maintaining stateful per-flow tracking in a NAT table that budgets 200-500 bytes per active TCP flow. This provides precise control and metrics visibility but all response traffic flows through the load balancer, capping throughput. DSR removes this egress bottleneck by having servers reply directly to clients, achieving 10-40 Gbps per server but requiring special server configuration and losing response visibility. For global ingress, deploy Anycast VIPs (Virtual IP addresses shared by multiple servers globally, with routing delivering traffic to the nearest one) with consistent hashing (a distribution algorithm where adding/removing a server only remaps 5-10% of traffic instead of reshuffling everything).
L7 Implementation: Connection Management
L7 requires full proxy architecture with separate client-side and backend connection pools. Terminate TLS at the edge to inspect traffic, achieving 1-5 Gbps per CPU core with modern ciphers. For zero-trust networks, re-encrypt to backends using mTLS (mutual TLS, where both sides authenticate). Apply header normalization (enforcing consistent header formatting to prevent security bypasses): set size limits of 8-16 KB, reject malformed requests, canonicalize header casing. Route traffic based on host, path, header, or cookie with weighted splitting for canary releases (gradually rolling out new versions by sending 1-10% of traffic to test). Use power of two choices (randomly pick two backends and route to the one with lower load) or EWMA latency balancing (Exponentially Weighted Moving Average, weighting recent latency measurements more heavily) to reduce tail latency by 10-30% compared to simple round robin.
Health Checks by Layer
L4 health checks validate TCP connectivity or TLS handshake success; mark backends down after SYN timeouts (typically 3-5 seconds threshold). This only confirms the port is open, not that the application is healthy. L7 health checks send actual HTTP requests and validate status codes plus optionally response body patterns ("200 OK with valid JSON"). Implement outlier detection (automatically removing backends showing elevated error rates): eject hosts after 5 consecutive errors or 10% error rate over 30 seconds, with quarantine timers of 30-60 seconds before retrying. Configure per-route timeouts: connect timeout 250ms-1s, header receive timeout 5-30s, idle timeout 30-300s.
Connection Pool Sizing
Prefer keepalives to amortize TCP and TLS handshake costs; TLS 1.3 reduces handshake round trips. Size upstream pools to cap concurrent connections per backend at 50-200 connections. For HTTP/2 (the protocol supporting multiplexing), tune maximum concurrent streams to 100-250 per connection, balancing multiplexing efficiency against head-of-line blocking risk (where one slow request delays others). Segregate long-lived connections like WebSockets (persistent bidirectional protocol) and gRPC streaming (RPC with long-lived streams) into dedicated pools or L4 paths to prevent them from consuming connection slots needed by short requests.
Capacity Planning and Monitoring
Maintain 20-30% headroom for failover scenarios. Monitor NAT table utilization at L4 (alert above 70%) and connection pool saturation at L7 (alert when request queuing exceeds 10ms). Watch for cross-layer blind spots: L4 metrics lack request-level context, while L7 may undersample at high request rates, hiding tail latency problems. Implement multi-signal health checks to catch gray failures (partial failures where a backend responds but with degraded quality or elevated latency).