Learn→Fraud Detection & Anomaly Detection→Real-time Scoring (Low-latency Inference)→4 of 6

Fraud Detection & Anomaly Detection • Real-time Scoring (Low-latency Inference)Hard⏱️ ~3 min

Tail Latency Amplification and Cascading Failures in Real-Time Systems

Tail Latency Amplification
A single slow dependency can dominate overall latency. If a request fans out to 10 services and waits for all to respond, the slowest service determines total latency. With 99th percentile latency at 100ms for each service, the aggregate P99 becomes 250ms or worse—the probability of hitting at least one slow response compounds across services.
Mitigation Pattern: Set aggressive timeouts on each dependency. If feature store does not respond in 15ms, proceed with cached or default values. Do not let one slow component block the entire request path.
Cascading Failures
When one service slows down, requests queue up. Queued requests consume memory and connection pool slots. Upstream services retry failed requests, multiplying load. The slow service becomes slower, causing more retries, until the entire system collapses. This cascade can take down healthy services that depend on the degraded one.
Circuit Breakers
Circuit breakers prevent cascade by failing fast. After N consecutive failures or P percent error rate, the circuit opens—all requests immediately return errors without attempting the downstream call. After a cooldown period, the circuit half-opens: a few test requests go through. If they succeed, the circuit closes and normal operation resumes.
Warning: Circuit breakers need graceful degradation paths. When the feature store circuit opens, the model must work with fewer features or default values. Design degraded modes during normal operation, not during incidents.
Timeout Budgets
Propagate deadline through the request. If total budget is 50ms and 20ms have elapsed, downstream services know they have 30ms remaining. Services that cannot complete in the remaining budget return immediately rather than starting work that will be discarded.

💡 Key Takeaways

✓Fan-out amplifies tail latency: 10 services at P99 100ms each yields aggregate P99 of 250ms or worse

✓Circuit breakers prevent cascades by failing fast after N failures—but require graceful degradation paths

✓Propagate deadline through requests so downstream services know remaining budget and can fail early

📌 Interview Tips

1Set aggressive dependency timeouts: if feature store misses 15ms deadline, proceed with cached or default values

2Explain circuit breaker states: closed (normal), open (fail fast), half-open (test recovery)

← Back to Real-time Scoring (Low-latency Inference) Overview