Failure Modes: Cache Stampede, Embedding Drift, and False Positives
CACHE STAMPEDE
Cache stampede occurs when a popular cache entry expires and hundreds of concurrent requests all miss cache simultaneously. All requests hit the model at once, potentially overwhelming it. For ML systems this is especially dangerous—model inference is expensive, so a stampede can cascade into complete service degradation.
Prevention strategies:
Probabilistic early refresh: Each request has a small probability of refreshing the cache before TTL expires. Spreads refresh load over time instead of concentrating at expiration.
Single-flight pattern: When cache misses, only one request actually computes. Others wait for that result. Requires coordination (mutex, semaphore) but eliminates duplicate computation.
Stale-while-revalidate: Serve stale result immediately while triggering background refresh. User gets fast response, cache gets updated asynchronously. Trades freshness for availability.
STALE CACHE SERVING
Stale results happen when cached data no longer reflects current model behavior or world state. Recommendation system returns cached suggestions for products now out of stock. Model was updated but old predictions still served from cache.
Detection: monitor cache age distribution and compare cached vs fresh results on sampled traffic. If divergence exceeds threshold, cache is too stale. Set up automatic invalidation triggers based on detected staleness. Alert when stale serving rate exceeds your SLO (e.g., >5% of responses older than 1 hour).
CACHE POISONING
Cache poisoning stores incorrect results that get served repeatedly to many users. In ML systems, this happens when model returns an error response that gets cached, or when adversarial input produces a bad cached result. Semantic caching adds risk—one poisoned entry can affect all similar queries through approximate matching.
Defenses: validate model outputs before caching (sanity checks on format, confidence scores, content). Use shorter TTL for uncertain predictions. Never cache error responses.