API Gateway Failure Modes and Resilience Patterns

Single Point of Failure
The gateway is the only path to all services. If it fails, everything fails. Mitigation requires multiple gateway instances behind a load balancer, with health checks removing unhealthy instances. Deploy across availability zones so a zone failure does not take down all gateways. Target 99.99% gateway availability, which allows 52 minutes downtime per year.
Gateway as Bottleneck
All traffic flows through the gateway, making it a potential throughput bottleneck. A gateway handling 10,000 RPS with 10ms processing time needs significant CPU. Horizontal scaling adds instances, but state like rate limit counters must be shared, typically via Redis. Complex aggregation logic increases per request processing time and reduces throughput.
Backend Failures
When backend services fail, the gateway must respond gracefully. Circuit breakers stop sending requests to failing services. Timeouts prevent gateway threads from blocking on slow backends. Fallback responses return cached data or error messages without waiting. Without these patterns, a single slow service exhausts gateway connection pools and degrades all traffic.
⚠️ Key Trade-off: More resilience features add latency and complexity. A gateway with circuit breakers, retries, and caching adds 5-15ms overhead. Balance protection against performance requirements.
Configuration Failures
Bad routing configuration can send all traffic to the wrong service or create routing loops. Use configuration validation before deployment, canary configuration rollouts, and automatic rollback on error rate spikes. Store configuration in version control with code review requirements.
Cascading Timeouts
Client timeout must exceed gateway timeout, which must exceed backend timeout. If client times out at 10s, gateway at 15s, and backend at 20s, the client gives up while gateway and backend continue wasting resources on abandoned requests. Proper ordering: backend 5s, gateway 8s, client 10s.

💡 Key Takeaways

✓Deploy multiple gateway instances across availability zones; target 99.99% availability (52 minutes downtime per year)

✓Circuit breakers, timeouts, and fallbacks prevent one slow backend from exhausting gateway resources and degrading all traffic

✓Timeout ordering: backend < gateway < client. Wrong ordering wastes resources on requests the client has abandoned.

📌 Interview Tips

1Explain the single point of failure mitigation: multiple instances, load balancer, health checks, multi zone deployment

2Calculate timeout ordering: if client is 10s, gateway should be 8s, backend should be 5s

3Mention shared state for rate limiting when horizontally scaling: Redis for distributed counters

← Back to API Gateway Patterns Overview