Production Architecture: Online Scoring, Feature Freshness, and Latency Budgets
Latency Budgets
Production fraud scoring happens in the critical path of payment authorization. The total authorization window is 300-800ms including network hops, database lookups, and issuer responses. Risk scoring must complete within 10-30ms at p99 to leave room for everything else. Many teams target sub-5ms model inference on commodity CPUs.
The budget breaks down: 1-2ms for feature retrieval from cache, 2-5ms for model inference, 1-2ms for decision logic and logging. Every millisecond matters at scale. At 10,000 transactions per second, a 10ms slowdown means 100,000 additional in-flight requests waiting.
Feature Freshness
Features mix static attributes and streaming aggregates. Static features (device fingerprint, merchant category, card metadata) come from key-value stores with sub-millisecond reads. Streaming features are the differentiator: payment attempts per card in last 10 minutes, total spend in last 24 hours, distinct devices per email in last 7 days.
Streaming features update through real-time pipelines with 100ms to few-second lag. High-value velocity checks (attempts in last 10 minutes) need sub-1-second freshness. Longer horizon features (7-day aggregates) tolerate 10-60 second lag. Stale features miss velocity attacks where fraudsters hit a card 5 times in 2 minutes.
Two Stage Architecture
A common pattern separates fast and slow paths. The fast path scores with precomputed features in under 10ms for checkout decisions. The slow path runs asynchronously after approval with richer graph features and secondary models, taking 100-500ms. High-risk transactions flagged by slow path trigger post-authorization holds.