Timeout Patterns: Preventing Unbounded Waits in Distributed Systems
Why Timeouts Are Essential
Without timeouts, a slow downstream service causes callers to wait forever. Threads accumulate, memory grows, connection pools exhaust. A service waiting indefinitely on a hung database query cannot serve other requests. Timeouts bound the worst case: even if the dependency never responds, the caller fails fast and frees resources.
Connection vs Read Timeouts
Connection timeout: How long to wait for TCP connection establishment. Typical values: 1-5 seconds. Detects network unreachability or firewall blocks. Read timeout: How long to wait for response data after connection. Typical values: 5-30 seconds depending on operation complexity. Both must be set; omitting either creates unbounded wait scenarios.
Choosing Timeout Values
Base timeouts on actual latency distributions, not guesses. If Service A p99 latency is 200ms, a 500ms timeout gives headroom for variance while catching failures. Too short causes false timeouts on healthy requests. Too long delays failure detection. Start with 2-3x p99 latency and tune from there.
Timeout vs Circuit Breaker
Timeouts handle individual request failures. Circuit breakers aggregate failures and stop sending requests entirely. A timeout fires once per slow request. A circuit breaker trips after multiple timeouts and prevents wasted effort on a known failing service. Use both: timeouts for individual protection, circuit breakers for systemic response.