Failure Modes and Edge Cases in L4/L7 Load Balancing
L4 State Exhaustion
L4 load balancers face state exhaustion under high connection churn or attack conditions. Connection tracking tables budget 200-500 bytes per active TCP flow; a proxy with 2GB reserved can hold millions of flows but drops new connections when the table fills. SYN floods (attacks that send massive numbers of TCP connection initiation packets without completing the handshake) cause rapid table turnover because each SYN allocates state. HTTP/1.1 without keepalive also causes high churn since each request opens a new connection. Mitigations include: SYN cookies (a technique where the server encodes connection state in the response packet instead of storing it, only allocating memory when the client completes the handshake), aggressive idle timeout tuning, and horizontal scaling using consistent hashing (a distribution algorithm that maps requests to servers such that adding/removing a server only remaps a small fraction of traffic). Monitor NAT table utilization and alert above 70%.
L4 Asymmetric Routing
Asymmetric routing in ECMP (Equal-Cost Multi-Path, where routers distribute traffic across multiple paths of equal cost) or Anycast (multiple servers sharing one IP address, with routers delivering to the nearest) deployments can send return packets via different paths than inbound packets. Stateful devices see mismatched flows and drop them: one load balancer received the SYN but a different one receives the ACK. Solutions include: DSR (Direct Server Return) where servers reply directly to clients bypassing the load balancer entirely, symmetric routing policies that ensure both directions traverse the same path, or stateless consistent hashing where any load balancer can handle any packet for a given flow because all load balancers use identical hash functions.
L7 Retry Storm Amplification
L7 load balancers introduce application-layer failure amplification through naive retry logic. Consider a backend handling 10,000 requests per second with 10% timing out due to overload. Those timeouts trigger 1,000 automatic retries, which add to backend load, causing more timeouts, generating more retries: exponential amplification. Prevention requires: retry budgets (limit total retries fleet-wide to 1.2-1.5x base request volume), jittered exponential backoff (randomize retry timing to prevent thundering herds), and circuit breakers (mechanisms that stop sending traffic to a failing backend after detecting repeated failures, allowing it time to recover before resuming traffic).
L7 Head-of-Line Blocking
Head-of-line blocking occurs when HTTP/2 multiplexing serializes streams behind a slow request. HTTP/2 sends multiple concurrent requests over a single TCP connection (called multiplexing); if one request takes 10 seconds, it can block other streams waiting on that connection even though they could complete in milliseconds. Large request or response bodies cause proxy buffering and memory spikes that can trigger OOM (Out Of Memory) termination. WebSockets (a protocol for persistent bidirectional communication) and gRPC streaming (RPC over HTTP/2 with long-lived streams) pin connections to specific backends; L7 proxies cannot reroute mid-stream, degrading load distribution during long-lived connections.
TLS and Certificate Failures
TLS termination at L7 introduces certificate-related failure modes. Certificate expiration, rotation failures, or OCSP (Online Certificate Status Protocol, used to check if certificates are revoked) stapling issues can trigger global outages. mTLS (mutual TLS) to backends adds certificate authority management complexity: expired intermediate certificates or SNI (Server Name Indication, the field in TLS that specifies which hostname the client wants) mismatches cause silent connection failures. Sticky sessions (routing repeated requests from the same client to the same backend) via cookie affinity create hot backends; when a node fails, lost affinity impacts stateful applications.