Resilience & Service PatternsTimeout PatternsEasy⏱️ ~1 min

Timeout Patterns: Preventing Unbounded Waits in Distributed Systems

Definition
Timeout is a limit on how long to wait for an operation to complete before giving up, preventing requests from waiting indefinitely on slow or failed dependencies.

Why Timeouts Are Essential

Without timeouts, a slow downstream service causes callers to wait forever. Threads accumulate, memory grows, connection pools exhaust. A service waiting indefinitely on a hung database query cannot serve other requests. Timeouts bound the worst case: even if the dependency never responds, the caller fails fast and frees resources.

Connection vs Read Timeouts

Connection timeout: How long to wait for TCP connection establishment. Typical values: 1-5 seconds. Detects network unreachability or firewall blocks. Read timeout: How long to wait for response data after connection. Typical values: 5-30 seconds depending on operation complexity. Both must be set; omitting either creates unbounded wait scenarios.

Choosing Timeout Values

Base timeouts on actual latency distributions, not guesses. If Service A p99 latency is 200ms, a 500ms timeout gives headroom for variance while catching failures. Too short causes false timeouts on healthy requests. Too long delays failure detection. Start with 2-3x p99 latency and tune from there.

💡 Key Insight: A timeout is a contract with your users about worst case latency. If your timeout is 30 seconds, users may wait 30 seconds before seeing an error. Make timeouts aggressive enough to fail fast.

Timeout vs Circuit Breaker

Timeouts handle individual request failures. Circuit breakers aggregate failures and stop sending requests entirely. A timeout fires once per slow request. A circuit breaker trips after multiple timeouts and prevents wasted effort on a known failing service. Use both: timeouts for individual protection, circuit breakers for systemic response.

💡 Key Takeaways
Connection timeout (1-5s) detects unreachability; read timeout (5-30s) catches slow responses. Set both.
Base timeout on actual latency: start with 2-3x p99 latency and tune based on false timeout rates
Timeouts handle individual requests; circuit breakers aggregate failures. Use both together.
📌 Interview Tips
1Explain both timeout types: connection for TCP establishment, read for waiting on response data
2Give sizing guidance: if p99 is 200ms, start with 500ms timeout (2-3x headroom)
3Distinguish from circuit breaker: timeout per request, circuit breaker after multiple failures
← Back to Timeout Patterns Overview
Timeout Patterns: Preventing Unbounded Waits in Distributed Systems | Timeout Patterns - System Overflow