Fraud Detection & Anomaly Detection • Graph-based Fraud Detection (GNNs)Hard⏱️ ~3 min
Implementation Details: Sampling, Caching, and Ensemble Fusion
Production graph fraud systems require careful implementation to meet latency and accuracy targets. The graph schema defines node types for user, account, card, device, IP, and merchant, plus edge types for transaction, login from device, shares identity attribute, and refund. Each edge carries a timestamp and confidence score. Systems maintain multiple rolling windows (1 hour, 24 hours, 7 days, 30 days) and apply time decay with half lives of 7 to 30 days to ensure recent activity dominates.
Sampling and bounding are essential for latency control. For each online decision, construct a 2 hop ego network bounded per type: up to 20 devices per user, 10 users per device, 50 transactions per user in the last 7 days, and 10 merchants per user. Beyond these caps, reject or downsample based on recency and edge weight, prioritizing recent high value edges. This keeps fetch time under 10 ms p95 and reduces noise from distant or weak connections. Degree based sampling downweights high degree nodes. For a device with 500 linked accounts, sample 20 using inverse degree probability so rare connections get higher weight.
Embeddings are precomputed offline on daily or 6 hour snapshots using a heterogeneous GNN. Store 128 to 256 dimensional vectors in a feature store like Feast or Tecton, or directly in an in memory cache (Redis or Memcached) keyed by node id. Refresh embeddings incrementally for hot nodes updated by streaming deltas. For example, if a device links to 3 new accounts in the last hour, recompute its embedding in near real time and update the cache. Cold nodes use the last snapshot embedding.
The online model is a small temporal aggregator that takes precomputed embeddings of the focal node and recent neighbors, plus online features like last N transaction amounts, inter event time gaps, and velocity counters. Typical architecture is a 2 layer MLP or a single GNN attention layer operating on cached embeddings and fresh edge features. Inference targets 5 to 10 ms on CPU with vectorized operations. Micro batching multiple requests per core improves throughput when latency budget allows, for example batching 8 to 16 requests to amortize model overhead.
Caching strategy is two tier. Maintain a RAM cache for hot entities like devices and merchants in active use, sized to cover the top 5 to 10 percent of traffic which typically accounts for 50 to 70 percent of requests due to power law distribution. Use an SSD backed cache for warm entities. Configure TTLs aligned with embedding refresh cadence, for example 6 hour TTL if embeddings refresh every 6 hours. Prewarm caches before peak events like Black Friday by loading predicted hot entities based on historical traffic patterns.
Sharding partitions the graph by entity id using consistent hashing. Co locate related node types when possible to reduce cross shard hops, for example place a user and its devices on the same shard. This minimizes network latency during neighborhood fetch. Maintain replication factor of 3 for high availability and provide bounded fan out APIs that respect per call neighbor limits to protect backend services from query storms.
Ensemble and thresholds combine the GNN score, a tree based tabular model (XGBoost or LightGBM on engineered features), and rule outputs in a calibrated fusion layer. Use a weighted average or a small logistic regression model to fuse scores. Calibrate with Platt scaling or isotonic regression on recent held out data per market because fraud patterns vary by geography. Maintain separate thresholds per customer segment (new user versus established, high risk geography versus low risk) and per transaction type (card present versus card not present). Set targets like block 90 percent of chargeback dollars at 1 percent false positive rate on good customers, and route an additional 1 to 3 percent to human review for borderline cases.
Monitoring tracks p50 and p95 decision latency, graph fetch cache miss rate, proportion of cold start decisions (no cached embedding), and drift metrics on input distributions. Measure business outcomes including chargeback rate, manual review rate, and customer friction. Log top influence paths for explainability, showing which neighbors contributed most to the fraud score. Maintain strict privacy and data retention controls for identity linkages and shared devices, deleting edges after retention windows expire. Retrain weekly or when metrics degrade by threshold deltas. Use hard example mining to oversample confirmed fraud and recent false positives, cost sensitive loss to penalize false negatives, and temporal cross validation that respects event time to mimic production deployment.
💡 Key Takeaways
•Sampling bounds per type (20 devices per user, 10 users per device, 50 transactions in 7 days) with degree based downsampling keeps graph fetch under 10 ms p95 and reduces noise
•Two tier caching with RAM for hot entities (top 5 to 10 percent covering 50 to 70 percent of traffic) and SSD for warm, TTLs aligned to 6 hour embedding refresh cadence
•Precompute 128 to 256 dimensional node embeddings offline every 6 hours on graph snapshot, incrementally refresh hot nodes from streaming deltas, store in feature store or Redis
•Ensemble fuses GNN score, XGBoost tabular model, and rules via calibrated weighted average or logistic layer, with separate thresholds per segment targeting 90 percent recall at 1 percent false positive rate
•Sharding by entity id with co location of related node types reduces cross shard hops, replication factor 3 for high availability, bounded fan out APIs protect backend
📌 Examples
Incremental embedding refresh: Device links to 3 new accounts in last hour. Streaming job recomputes device embedding using cached neighbor embeddings plus new edges, updates Redis cache with 2 minute latency.
Micro batching for throughput: Online model receives 16 concurrent requests, batches them into single inference call with shape (16, 256) embedding input, amortizes model load overhead from 8ms to 2ms per request.
Threshold calibration: New users in high risk geography (fraud rate 2 percent) get threshold 0.08 for step up, established users in low risk geography (fraud rate 0.2 percent) get threshold 0.15, balancing friction and risk.