Adaptive Timeouts: Dynamic Adjustment Based on System State
Why Fixed Timeouts Are Suboptimal
Fixed timeout of 5 seconds might be too long during normal operation and too short during peak load. Latency varies by time of day, traffic patterns, and system state. A timeout tuned for normal conditions causes unnecessary failures during legitimate slow periods.
Percentile Based Timeouts
Set timeout based on recent latency percentiles. If p99 latency over the last 5 minutes is 300ms, set timeout to p99 × 1.5 = 450ms. This adapts to actual performance: faster during good times, more lenient during slow periods. Requires latency tracking infrastructure and careful handling of edge cases.
Feedback Loop Risks
If timeout increases during high latency, and high latency is caused by overload, longer timeouts make overload worse (more concurrent requests). The adaptive algorithm must have bounds: minimum timeout prevents going too low, maximum timeout prevents going too high. Rate of change limiting prevents wild swings.
Implementation Approaches
Sliding window: Calculate p99 over last N requests or T minutes. Exponential moving average: Smooth recent latencies, set timeout as multiple of average. Histogram based: Maintain latency histogram, derive timeout from percentile. Each approach has different memory and computation tradeoffs.
When to Use Adaptive Timeouts
High traffic services with significant latency variance benefit most. Low traffic services lack data for good adaptation. Critical paths may prefer predictable fixed timeouts. Start with fixed timeouts, move to adaptive when you have data showing fixed values cause problems.