Fraud Detection & Anomaly DetectionUnsupervised Anomaly Detection (Isolation Forest, Autoencoders)Medium⏱️ ~2 min

How Does Isolation Forest Work?

Isolation Forest identifies anomalies by measuring how easy it is to isolate a point from the rest of the data using random recursive partitioning. The algorithm builds many trees where each node randomly selects a feature and a split value within that feature's range. It recursively partitions the data until each point is isolated in its own leaf. The key insight is that anomalies require fewer splits to isolate because they live in sparse regions of feature space, far from dense clusters of normal points. The algorithm works as follows. First, subsample the training data to a fixed size, typically 256 to 10,000 records, to reduce bias and maintain speed. Then build an ensemble of 100 to 500 trees. For each tree, randomly select a feature and split value, partition the data, and repeat until each point is isolated or a maximum depth is reached. At inference time, pass each new point through all trees and measure the average path length from root to leaf. Shorter paths indicate anomalies. The anomaly score is normalized using the expected path length for a given sample size, producing values between 0 and 1, where scores above 0.5 suggest anomalies. Complexity is approximately O(n log n) for training with subsampling, and scoring is just tree traversal at O(log n) per tree. On modern CPUs, a single core can score 50,000 to 200,000 events per second for 100 trees with tens of numeric features, assuming the model fits in cache. This makes Isolation Forest ideal for high throughput real time systems. It handles high dimensional tabular data well because it does not rely on distance metrics, which degrade in high dimensions due to the curse of dimensionality. AWS CloudWatch uses Random Cut Forest, a variant of Isolation Forest, to monitor hundreds of thousands to millions of time series metrics at minute or second resolution. The system scores 8,333 metrics per second, with per metric inference well under one second. A single scoring node can handle tens of thousands of scores per second. Payments processors deploy Isolation Forest as the first stage filter, catching sparse novel fraud patterns before supervised models see them.
💡 Key Takeaways
Builds ensemble of trees using random feature selection and split values, isolating anomalies in fewer splits due to sparse regions
Subsample training data to 256 to 10,000 records per tree, use 100 to 500 trees, and limit tree depth to reduce overfitting
Scores 50,000 to 200,000 events per second per CPU core for 100 trees with tens of features, making it suitable for real time pipelines
Does not rely on distance metrics so handles high dimensional data better than density based methods which suffer from curse of dimensionality
AWS CloudWatch processes 8,333 metrics per second using Random Cut Forest variant, monitoring millions of time series at scale
📌 Examples
Stripe fraud detection: 100 to 300 trees, max sample 256 to 10,000, scores 0.2 to 0.5 milliseconds per transaction with 50 to 100 features
PayPal risk pipeline: Isolation Forest as first stage filter, flags top 1% of events for supervised model, reducing downstream compute by 99%
Uber anomaly detection: Deploys Isolation Forest on aggregated session features, catches account takeover patterns with path lengths under 3 splits
Amazon product fraud: Detects fake review campaigns where coordinated accounts cluster together but are sparse relative to organic reviews
← Back to Unsupervised Anomaly Detection (Isolation Forest, Autoencoders) Overview
How Does Isolation Forest Work? | Unsupervised Anomaly Detection (Isolation Forest, Autoencoders) - System Overflow