Timeout Failures: Handling, Retrying, and User Experience
Timeout Is Not Failure Confirmation
A timeout means the operation status is unknown. The request might have succeeded, failed, or still be processing. For idempotent operations (GET, PUT with same data), retry safely. For non-idempotent operations (POST creating records, payment charges), retry can cause duplicates. This uncertainty is the hardest part of timeout handling.
Retry Strategies After Timeout
Idempotent operations: Retry with exponential backoff. Non-idempotent: Query status before retrying (did my payment go through?). Idempotency keys: Include unique request ID so server can deduplicate. If server already processed request ID, return cached response instead of re-executing.
User Experience on Timeout
Users should not see raw timeout errors. Options: show generic error with retry button, show partial results with indication that some data is missing, show cached/stale data with freshness indicator. For critical operations like payments, show clear status: processing, succeeded, or failed with instructions.
Compensating Actions
If timeout occurs during multi-step transaction, previous steps may have completed. Implement compensation: if payment timed out but check shows it succeeded, mark order as paid. If payment failed, refund any partial charges. Use saga pattern for complex multi-service transactions with timeout handling at each step.
Timeout Metrics
Track timeout rate per dependency, per endpoint, per operation type. Sudden increase indicates downstream problems. Consistent timeout rate may indicate timeout is too aggressive. Compare timeout rate with actual downstream latency to tune values. Alert when timeout rate exceeds baseline.