Fraud Detection & Anomaly Detection • Graph-based Fraud Detection (GNNs)Easy⏱️ ~2 min
Why Fraud Detection Needs Graph Based Models
Fraud is fundamentally relational. Bad actors coordinate across multiple accounts, devices, merchants, and IP (Internet Protocol) addresses to evade detection. Traditional tabular models evaluate each transaction in isolation, missing patterns like ten new accounts transacting with the same merchant through the same device within hours. Each individual transaction might look normal with reasonable amounts and valid card details, but the coordinated pattern screams fraud ring.
Graph based fraud detection represents this as a heterogeneous graph where nodes are entities like users, cards, devices, IPs, and merchants, and edges are relationships like transaction, login, shared device, or shared address. Each node carries features such as historical spend, chargeback rate, geolocation variance, device entropy, and signup age. Each edge has attributes including amount, timestamp, channel, and velocity metrics.
This representation enables the model to learn that suspicious patterns emerge from connections, not just attributes. For example, PayPal uses heterogeneous graphs combining users, devices, and merchants to detect collusive rings, reporting material lifts in recall at fixed precision after introducing graph features. The graph captures synthetic identity fraud where attackers link stolen credentials through shared addresses or phones, creating clusters invisible to models examining transactions one at a time.
The challenge is scale and speed. A mid to large payment processor maintains 200 million entities and 5 to 20 billion edges with a rolling 30 to 90 day window, yet must return fraud decisions within 50 to 100 ms at the 95th percentile while handling 5,000 to 50,000 events per second during peak shopping periods.
💡 Key Takeaways
•Fraud rings coordinate across multiple entities creating graph patterns invisible to single transaction analysis
•Heterogeneous graphs model different node types (users, devices, cards, merchants) and relationship types (transactions, logins, shared attributes)
•Production scale reaches 200 million entities and 5 to 20 billion edges with 30 to 90 day rolling windows
•Real time decision latency targets 50 to 100 ms p95 while processing 5,000 to 50,000 events per second during peak
•PayPal and payment processors report material recall improvements after adding graph features to detect collusive behavior
📌 Examples
Synthetic identity fraud: Ten accounts created with stolen SSNs all share the same phone number and billing address, creating a cluster. Each individual account looks legitimate but the shared attributes reveal coordination.
Device takeover ring: A single device logs into 50 accounts within 24 hours and initiates password changes. Graph model flags the abnormal device to account ratio.
Merchant collusion: A merchant receives transactions from 30 new accounts all created within the same week, each making similar dollar amounts. The graph reveals the suspicious clustering around merchant and time.