Common Pitfalls and Anti-Patterns
Anti-Pattern: Shedding Too Late
A common mistake is shedding after expensive work. Request authenticates (5ms), validates (2ms), queries database (30ms), then rejects. You wasted 37ms CPU plus database capacity. At 10,000 RPS shed rate, that is 370 seconds CPU wasted per second. Shedding checks should take less than 1ms and occur before any significant work.
Anti-Pattern: Unbounded Queues
Unbounded queues mask overload until memory exhausts. A queue growing from 100 to 1,000,000 looks fine until OOM (Out of Memory) crashes. Bound all queues to 2-5 seconds burst capacity: expected_rps × acceptable_wait.
Anti-Pattern: Retry Amplification
System sheds 1000 requests. All clients retry immediately. Now 2000 requests arrive. Retry loop worsens each iteration. Fix requires exponential backoff with jitter on clients and Retry-After headers from servers.
Anti-Pattern: Priority Inversion
Critical payment request calls optional fraud check service. When fraud service is shed, payments fail. If A depends on B and A is critical, B cannot be optional. Either elevate B priority or make the dependency optional with fallback.
Anti-Pattern: Single Metric Decisions
CPU-only shedding ignores memory pressure. Latency-only ignores whether slowness is internal or external. Use composite health: health = 0.4×cpu + 0.3×memory + 0.3×latency. Shed when health drops below threshold.
Anti-Pattern: No Graceful Recovery
Aggressive shedding without hysteresis creates oscillation. Use hysteresis: shed at 80% CPU, stop only when below 60%. Gradually increase acceptance using AIMD (Additive Increase, Multiplicative Decrease) pattern: increase acceptance by small constant on success, cut by half on overload.