How Graph Neural Networks Learn Fraud Patterns
Message Passing Fundamentals
GNNs learn by passing messages between connected nodes. Each node starts with its own features (transaction amount, user age, device type). In each layer, nodes aggregate messages from their neighbors, combine them with their own features, and produce updated representations. After 2-3 layers, each node embedding contains information from its extended neighborhood.
Core Mechanism: A 2-layer GNN lets each node see 2 hops away. If user A connects to device B, and device B connects to flagged user C, then A incorporates signals from C even without direct connection. This multi-hop visibility catches fraud rings.
Aggregation Functions
How nodes combine neighbor messages determines what patterns the model learns. Mean aggregation treats all neighbors equally—good for density anomalies. Max aggregation captures the most suspicious neighbor—good for single toxic connections. Attention-based aggregation learns which neighbors matter most, adapting weights based on the task.
For fraud detection, attention mechanisms often outperform fixed aggregations. The model learns that connections to recently created accounts matter more than connections to established accounts.
Training on Imbalanced Labels
Fraud is rare (0.1-1% of transactions). Standard training produces models predicting everything as legitimate. Solutions: oversample fraud cases, use focal loss emphasizing hard examples, or frame as edge prediction (predicting whether a transaction will be fraudulent).
Training Insight: Edge-level prediction (given a proposed transaction edge, predict fraud) naturally handles class imbalance since you control which edges to train on.