Learn→Fraud Detection & Anomaly Detection→Real-time Scoring (Low-latency Inference)→2 of 6

Fraud Detection & Anomaly Detection • Real-time Scoring (Low-latency Inference)Medium⏱️ ~3 min

The Complete Real-Time Scoring Flow for Fraud Detection

The Request Path
A transaction arrives at the fraud scoring service. Within 50ms, the system must: validate the request (1ms), fetch user features from the online store (5-10ms), compute real-time features like transaction velocity (2-5ms), run model inference (10-30ms), apply post-processing rules (2-5ms), and return the decision. Each component has a timeout; if any step exceeds its budget, the system returns a default decision.
Critical Path: Feature retrieval and model inference dominate latency. Optimizing these two components yields the largest gains. Everything else is noise in comparison.
Feature Retrieval
Pre-computed features (user historical stats, account age, velocity aggregates) live in the online feature store—a low-latency key-value store optimized for single-key lookups. The model needs 50-200 features; fetching each individually would take too long. Batch lookups retrieve all features for a user in a single round-trip (multi-get operation).
Real-Time Feature Computation
Some features cannot be pre-computed: current transaction amount, time since last transaction, merchant category match. These are computed inline from the request payload. Keep real-time computation simple—complex aggregations should be pre-computed in streaming pipelines and stored for lookup.
Design Pattern: Separate features into three tiers: pre-computed (feature store lookup), real-time (inline computation), and context (request payload). This separation clarifies ownership and optimization paths.
Inference and Response
The feature vector feeds into the model. Inference time depends on model complexity: linear models run in microseconds, tree ensembles in 1-5ms, neural networks in 10-50ms. Post-processing applies business rules: minimum scores, blocklist overrides, risk thresholds. The final response includes the score and recommended action.

💡 Key Takeaways

✓Feature retrieval and model inference dominate latency—optimize these two for the largest gains

✓Batch feature lookups (multi-get) retrieve all 50-200 features in a single round-trip

✓Separate features into tiers: pre-computed (store lookup), real-time (inline), context (request payload)

📌 Interview Tips

1Walk through the 50ms budget: validate 1ms, fetch features 5-10ms, compute real-time features 2-5ms, inference 10-30ms, rules 2-5ms

2Explain that complex aggregations should be pre-computed in streaming pipelines, not computed inline

← Back to Real-time Scoring (Low-latency Inference) Overview