Fraud Detection & Anomaly DetectionReal-time Scoring (Low-latency Inference)Easy⏱️ ~2 min

What is Real-Time Scoring and Why is Latency Critical?

Definition: Real-time scoring is the process of computing ML model predictions within strict latency bounds (typically 10-100ms) as part of a synchronous request flow. The user or system waits for the prediction before proceeding—blocking until the score is returned.

Why Latency Matters

In fraud detection, a 200ms delay on every transaction degrades user experience noticeably. In ad ranking, every 100ms of latency costs measurable revenue. In recommendations, slow responses cause users to scroll past before personalized content loads. The model must return a decision within the allocated time budget or the system falls back to defaults.

Latency requirements cascade through the system. If the total API budget is 150ms and database lookups take 50ms, the model has only 100ms for feature computation, inference, and response formatting combined.

The Latency Budget

Real-time systems allocate latency budgets to each component: feature retrieval (5-20ms), feature transformation (1-5ms), model inference (5-50ms), post-processing (1-5ms), network overhead (5-15ms). The total must stay under the SLA. Exceeding any component budget triggers timeouts or degraded responses.

Key Insight: P99 latency matters more than average. If 1% of requests take 500ms, that is 10,000 slow requests per million—enough to impact user experience and trigger SLA violations. Design for the tail, not the mean.

Real-Time vs Batch vs Streaming

Batch scoring runs offline on stored data—no latency constraints. Streaming scoring processes events continuously with seconds of delay. Real-time scoring is synchronous: the request waits for the response. Each has different infrastructure requirements and model optimization strategies.

💡 Key Takeaways
Real-time scoring requires predictions within 10-100ms as part of synchronous request flows
Latency budgets cascade: if total API budget is 150ms and DB takes 50ms, model gets only 100ms for everything else
P99 latency matters more than average—1% slow requests at scale means thousands of degraded user experiences
📌 Interview Tips
1When asked about latency requirements, break down the budget: feature retrieval 5-20ms, inference 5-50ms, network 5-15ms
2Explain that every 100ms of latency in ad ranking costs measurable revenue—the business case for optimization is concrete
← Back to Real-time Scoring (Low-latency Inference) Overview