Implementation Patterns: Latency Budgets, Hedging, and BDP Aware Tuning
Latency Budgets
A latency budget divides your total allowed latency across all operations in a request path. If users tolerate 200ms total, and you call 4 services, each gets roughly 50ms. But budgets are not just division; they require tracking remaining budget through each hop.
Pass remaining budget as a request header or context parameter. Service A starts with 200ms, spends 30ms, passes 170ms to Service B. Service B spends 40ms, passes 130ms to Service C. Each service knows whether to attempt optional operations (cache population, logging) or skip them to meet the budget. Services can also use remaining budget to set appropriate timeouts on downstream calls.
Hedged Requests
Hedging sends duplicate requests to multiple backends simultaneously or after a delay. The first response wins, others are discarded. This cuts tail latency at the cost of increased load.
Simple hedging doubles load but dramatically improves tail latency. Delayed hedging is more efficient: wait for p90 latency (say 50ms), then send a second request only if the first has not returned. If the first request was going to hit p99 (200ms), the hedged request likely returns at median (20ms). You improve p99 from 200ms to 70ms while only adding 10% extra load.
Bandwidth Delay Product Tuning
Bandwidth Delay Product (BDP) is bandwidth multiplied by round trip time. It represents how much data can be in flight between sender and receiver. BDP = bandwidth × RTT.
For a 1 Gbps link with 50ms RTT: BDP = 1,000,000,000 bits/sec × 0.05 sec = 50,000,000 bits = 6.25 MB. Your TCP buffers and in-flight data must accommodate 6.25 MB to fully utilize the link. Default buffer sizes (often 64KB) waste 99% of capacity on high-latency links.
Connection Pooling
Connection establishment is expensive: TCP handshake takes one RTT, TLS handshake adds one to two more RTTs. A cross-region request with 100ms RTT spends 200 to 300ms just connecting before any data transfer.
Connection pools maintain open connections for reuse. Size the pool using Little's Law: pool_size = throughput × latency. For 100 RPS with 50ms latency, you need at least 5 connections. Add headroom for bursts: 2x to 3x the calculated minimum.