Fraud Detection & Anomaly Detection • Real-time Scoring (Low-latency Inference)Medium⏱️ ~3 min
The Complete Real-Time Scoring Flow for Fraud Detection
Understanding a concrete end to end flow helps you see where latency hides and where failures happen. Consider payment fraud detection at companies like Stripe or PayPal. When a user clicks checkout, the request hits an edge service that handles authentication and routing, consuming 2 to 5 milliseconds within a region. The scoring service receives the request and immediately allocates a per request latency budget, typically 60 milliseconds at p99.
The service now races against time. First, it fetches identity and device features from an online feature store. With a 95 percent cache hit rate, these reads complete in 2 to 5 milliseconds. Cache misses go to the underlying database and take 5 to 10 milliseconds. While waiting for stored features, the service computes lightweight real time features directly from the request payload, operations like extracting email domain or counting recent attempts, finishing in 1 to 3 milliseconds. Once features are ready, the model server runs inference using a gradient boosted tree or compact neural network, completing in 2 to 10 milliseconds on CPU.
After the model produces a fraud score, the service applies business rules, generates an explanation for logging and appeals, and returns a decision: approve, decline, or request additional verification. The entire end to end flow must stay under 100 milliseconds at p99 to avoid slowing the authorization path. Remember that the total authorization including the bank issuer often takes 300 to 2000 milliseconds, so the fraud check is just one piece. Companies run this infrastructure across multiple regions to keep network hops small, since a cross continent round trip adds 100 to 200 milliseconds alone.
This architecture reveals key design principles. Each stage has a budget and a fallback. If the feature store is slow, you might skip optional features or use cached defaults. If the model server times out, you can return a score from a simpler rule based system. Observability is critical: you must track latency by stage so you can identify which component regressed after a deployment.
💡 Key Takeaways
•Payment fraud flow allocates 60ms p99 budget across edge routing at 2 to 5ms, feature fetch at 2 to 10ms, compute at 1 to 3ms, and model at 2 to 10ms
•Online feature store achieves 95 percent cache hit rate with 2 to 5ms reads, falling back to database at 5 to 10ms for misses
•Multi region deployment keeps network hops small since cross continent latency adds 100 to 200ms, making regional serving essential
•Each stage has a fallback: skip optional features if store is slow, return rule based score if model times out, preserving availability
•Observability tracks latency per stage so regressions can be attributed to specific components after deployment or traffic changes
•Total authorization including bank issuer takes 300 to 2000ms, so fraud scoring at under 100ms p99 stays well within upstream timeout budget
📌 Examples
Stripe processes millions of authorizations daily with fraud scoring running inline, using regional clusters and cached features to hit sub 100ms p99 targets
A typical feature fetch might retrieve user lifetime transaction count, device fingerprint risk score, and merchant category from the online store in a single batched read
When feature store p99 degrades from 5ms to 20ms due to a cache invalidation storm, the scoring service automatically skips non critical features to preserve SLO