Failure Modes: Hot Shards, Stampedes, and Recall Regressions
HOT SHARDS
Uneven query distribution overloads specific shards. Causes: popular items clustered, trending queries hitting same partition. Symptoms: p99 spikes while p50 normal, one shard at 100% CPU. Fix: hash routing to spread items, replicate hot shards more heavily.
CACHE STAMPEDES
Cached items expire simultaneously, all requests hit backend at once. Popular embedding expires: 1000 concurrent requests instead of 1. Database overloads, latency spikes. Fix: jittered TTLs (random 0-10% added), cache warming, request coalescing.
RECALL REGRESSIONS
ANN recall degrades silently as index grows. Index for 100M vectors has 98% recall; at 1B, drops to 90% without retuning. Symptoms: engagement declines gradually. Fix: monitor recall offline, retune as data grows, rebuild indexes periodically.
CASCADING FAILURES
One failure overloads others. Cache fails, all requests hit database, database overloads, timeouts cascade. Fix: circuit breakers, graceful degradation (serve stale on failure), capacity planning with failure modes.