Fraud Detection & Anomaly DetectionGraph-based Fraud Detection (GNNs)Hard⏱️ ~3 min

Production Serving Architecture: Latency and Scale Trade-offs

The Latency Challenge

Real-time fraud detection requires decisions within 50-100ms. GNN inference must fetch the target node, retrieve its neighborhood (potentially thousands of edges), compute aggregations, and return a score. Each graph traversal adds latency. A 2-hop neighborhood on a dense graph might touch millions of nodes—impossible to compute in real-time without optimization.

Design Trade-off: Larger neighborhoods capture more fraud patterns but increase latency. Production systems typically limit to 1-2 hops with sampled neighbors (10-50 per node) to keep inference under 50ms while retaining most detection power.

Neighborhood Sampling

Rather than fetching all neighbors, sample a fixed number per hop. Uniform sampling selects neighbors randomly. Importance sampling prioritizes suspicious or active neighbors. Stratified sampling ensures representation of different relationship types (device links vs transaction links). The sampling strategy significantly affects which fraud patterns the model catches.

Pre-computed Embeddings

Instead of computing GNN embeddings at inference time, pre-compute node embeddings periodically (hourly or daily) and store them. At inference time, fetch the pre-computed embedding and combine with real-time transaction features. This reduces latency to a simple lookup plus a small neural network forward pass.

Warning: Pre-computed embeddings become stale. A user flagged 1 hour ago still has a clean embedding until the next refresh. Balance freshness (more frequent updates) against computational cost.

Graph Database Selection

The graph store must support fast neighbor lookups. Options: native graph databases (Neo4j, TigerGraph) optimized for traversals, key-value stores (Redis) with adjacency lists, or distributed stores (DynamoDB) for scale. Choose based on query patterns: random access vs batch processing.

💡 Key Takeaways
Real-time GNN inference requires 50-100ms latency—limit to 1-2 hops with 10-50 sampled neighbors per node
Pre-computed embeddings reduce inference to a lookup plus small forward pass, but embeddings become stale between refreshes
Graph store choice (Neo4j, Redis adjacency lists, DynamoDB) depends on query patterns: random access vs batch processing
📌 Interview Tips
1Explain the latency trade-off: 2-hop neighborhood on dense graph might touch millions of nodes, so sampling is essential
2Mention that pre-computed embeddings risk staleness—a user flagged 1 hour ago still has clean embedding until next refresh
← Back to Graph-based Fraud Detection (GNNs) Overview