Timeout Patterns: Preventing Unbounded Waits in Distributed Systems

Definition
Timeout is a limit on how long to wait for an operation to complete before giving up, preventing requests from waiting indefinitely on slow or failed dependencies.
Why Timeouts Are Essential
Without timeouts, a slow downstream service causes callers to wait forever. Threads accumulate, memory grows, connection pools exhaust. A service waiting indefinitely on a hung database query cannot serve other requests. Timeouts bound the worst case: even if the dependency never responds, the caller fails fast and frees resources.
Connection vs Read Timeouts
Connection timeout: How long to wait for TCP connection establishment. Typical values: 1-5 seconds. Detects network unreachability or firewall blocks. Read timeout: How long to wait for response data after connection. Typical values: 5-30 seconds depending on operation complexity. Both must be set; omitting either creates unbounded wait scenarios.
Choosing Timeout Values
Base timeouts on actual latency distributions, not guesses. If Service A p99 latency is 200ms, a 500ms timeout gives headroom for variance while catching failures. Too short causes false timeouts on healthy requests. Too long delays failure detection. Start with 2-3x p99 latency and tune from there.
💡 Key Insight: A timeout is a contract with your users about worst case latency. If your timeout is 30 seconds, users may wait 30 seconds before seeing an error. Make timeouts aggressive enough to fail fast.
Timeout vs Circuit Breaker
Timeouts handle individual request failures. Circuit breakers aggregate failures and stop sending requests entirely. A timeout fires once per slow request. A circuit breaker trips after multiple timeouts and prevents wasted effort on a known failing service. Use both: timeouts for individual protection, circuit breakers for systemic response.

💡 Key Takeaways

✓Connection timeout (1-5s) detects unreachability; read timeout (5-30s) catches slow responses. Set both.

✓Base timeout on actual latency: start with 2-3x p99 latency and tune based on false timeout rates

✓Timeouts handle individual requests; circuit breakers aggregate failures. Use both together.

📌 Interview Tips

1Explain both timeout types: connection for TCP establishment, read for waiting on response data

2Give sizing guidance: if p99 is 200ms, start with 500ms timeout (2-3x headroom)

3Distinguish from circuit breaker: timeout per request, circuit breaker after multiple failures

← Back to Timeout Patterns Overview