Implementation Details: Sampling, Caching, and Ensemble Fusion
Neighbor Sampling Strategies
Random uniform sampling is simplest but may miss important connections. Importance sampling weights neighbors by recency, transaction volume, or suspicion score—prioritizing informative nodes. Layer-wise sampling uses different strategies per hop: strict sampling in first hop for speed, relaxed in second hop for coverage. The sampling budget (total neighbors fetched) directly trades off latency vs accuracy.
Implementation Pattern: Cache sampled neighborhoods for recently active nodes. High-activity users get sampled repeatedly; caching their neighborhoods (with 5-minute TTL) reduces graph database load by 60-80% during traffic spikes.
Embedding Caches
Store pre-computed node embeddings in low-latency cache (Redis, Memcached). At inference, fetch cached embedding if fresh enough; otherwise compute on-demand and update cache. Two-tier caching: hot embeddings in memory (sub-millisecond), warm embeddings in Redis (2-5ms), cold nodes computed live (20-50ms).
Ensemble Fusion
Production systems rarely use GNNs alone. Combine GNN scores with point-wise model scores (XGBoost on transaction features), rule-based systems (velocity checks, blacklists), and reputation scores. Ensemble fusion options: weighted average, stacking (meta-model on component scores), or cascaded filtering (rules first, GNN for ambiguous cases).
Production Insight: Cascaded architecture saves compute: cheap rules filter 80% of clear cases, GNN inference runs only on the remaining 20% ambiguous transactions. This reduces GNN serving costs by 5x while maintaining detection quality.
Batch vs Real-time
Some fraud detection is not time-critical (account reviews, periodic sweeps). Run batch GNN inference overnight on the full graph without sampling constraints. Use batch results to update node risk scores that real-time systems consume. This hybrid approach balances thoroughness with latency requirements.