ML-Powered Search & RankingScalability (Sharding, Caching, Approximate Search)Hard⏱️ ~2 min

Failure Modes: Hot Shards, Stampedes, and Recall Regressions

Definition
Scalability failure modes are production issues where sharding, caching, or approximate search breaks down—causing latency spikes, accuracy drops, or cascading failures.

HOT SHARDS

Uneven query distribution overloads specific shards. Causes: popular items clustered, trending queries hitting same partition. Symptoms: p99 spikes while p50 normal, one shard at 100% CPU. Fix: hash routing to spread items, replicate hot shards more heavily.

CACHE STAMPEDES

Cached items expire simultaneously, all requests hit backend at once. Popular embedding expires: 1000 concurrent requests instead of 1. Database overloads, latency spikes. Fix: jittered TTLs (random 0-10% added), cache warming, request coalescing.

💡 Key Insight: Cache stampedes are self-inflicted DDoS. The more popular an item, the worse the stampede. Add jitter proportional to popularity.

RECALL REGRESSIONS

ANN recall degrades silently as index grows. Index for 100M vectors has 98% recall; at 1B, drops to 90% without retuning. Symptoms: engagement declines gradually. Fix: monitor recall offline, retune as data grows, rebuild indexes periodically.

CASCADING FAILURES

One failure overloads others. Cache fails, all requests hit database, database overloads, timeouts cascade. Fix: circuit breakers, graceful degradation (serve stale on failure), capacity planning with failure modes.

⚠️ Key Trade-off: Defensive measures add complexity. Circuit breakers, jittered TTLs all require effort. Prioritize based on blast radius.
💡 Key Takeaways
Hot shards: uneven distribution causes spikes—hash routing or replicate hot shards
Cache stampedes: simultaneous expiry overloads backend—use jittered TTLs
Recall regressions: ANN degrades silently—monitor offline, rebuild periodically
📌 Interview Tips
1Describe stampede as self-inflicted DDoS with mitigation
2Mention recall regression as silent killer
← Back to Scalability (Sharding, Caching, Approximate Search) Overview