Learn→Fraud Detection & Anomaly Detection→Supervised Anomaly Detection (Imbalanced Classification)→3 of 6
Fraud Detection & Anomaly Detection • Supervised Anomaly Detection (Imbalanced Classification)Hard⏱️ ~3 min
Production Architecture: Online Scoring, Feature Freshness, and Latency Budgets
Production fraud detection systems must score thousands of events per second within tight latency budgets while computing fresh streaming features. Payment processors like Stripe handle 5,000 to 20,000 transactions per second normally and over 50,000 TPS during holiday peaks. The end to end payment authorization path to card networks allows 300 to 800 milliseconds total. Risk scoring must complete within 10 to 30 milliseconds at p95 to leave room for network hops, database lookups, and issuer responses. Many teams target sub 5 millisecond model inference on commodity CPUs to control tail latencies.
Features mix static attributes and streaming aggregates. Static features include device fingerprint hash, merchant risk tier, and card BIN metadata, retrieved from low latency key value stores with sub millisecond p99. Streaming features are the differentiator: payment attempts per card in last 10 minutes, total spend by account in last 24 hours, distinct device count per email in last 7 days. These aggregates update through streaming pipelines (Kafka, Flink, or custom) with end to end lag between 100 milliseconds and a few seconds. High value features like short window velocity checks require sub 1 second freshness. Longer horizon features tolerate 10 to 60 second lag. The feature store maintains dual paths: an online store (Redis, DynamoDB) optimized for p99 single digit millisecond reads, and an offline store (S3, BigQuery) for training with consistent feature definitions across both.
Serving architecture uses stateless model servers with warm models loaded in memory. Stripe runs XGBoost and LightGBM ensembles compiled to native code for sub millisecond inference per model. Multiple models run in parallel: a primary model, challenger models in shadow mode, and fallback rule engines. Shadow deployments run for 2 to 4 weeks before promotion because label delays make short term metrics unreliable. Monitoring tracks score distribution shifts (alert on Population Stability Index over 0.2), feature staleness (alert if streaming lag exceeds 5 seconds), and precision on fast feedback proxies like network risk codes that arrive within hours instead of the 30 to 90 day chargeback delay.
The two stage pattern appears across companies. A fast path scores with precomputed features in under 10 milliseconds for checkout decisions. A slow path runs asynchronously after approval with richer graph features, external API calls, and secondary models, taking 100 to 500 milliseconds. High risk transactions flagged by the slow path can trigger post authorization holds or fulfillment blocks. Uber uses similar architecture: trip requests score in under 100 milliseconds before dispatch using device and user history features, then post trip audits run more expensive graph analysis and driver pattern checks.
💡 Key Takeaways
•Latency budget is 10 to 30 milliseconds p95 for risk scoring within 300 to 800 millisecond total payment authorization path
•Streaming features require sub 1 second freshness for velocity checks, with pipeline lag between 100 milliseconds and 2 seconds
•Feature store dual path: online store (Redis, DynamoDB) for sub millisecond reads, offline store (S3, BigQuery) for consistent training
•Model inference under 1 millisecond per model on commodity CPUs using compiled XGBoost or LightGBM ensembles
•Shadow deployments run 2 to 4 weeks before promotion due to 30 to 90 day label delay making short term metrics unreliable
•Two stage pattern: fast path (10ms) for checkout with precomputed features, slow path (100 to 500ms) asynchronous for graph and external APIs
📌 Examples
Stripe Radar: 20,000 TPS peak, 5ms model inference budget, features from Redis with 500 milliseconds streaming lag, XGBoost ensemble with 150 trees scored in parallel
PayPal risk engine: dual path with 15ms fast scoring for authorization, 200ms slow path post authorization for device fingerprint API and graph analysis, holds high risk before fulfillment
Uber safety: 100ms scoring budget before trip dispatch, streaming features include driver trips in last hour and device switching patterns, post trip audit runs 5 minute batch analysis