Fraud Detection & Anomaly DetectionUnsupervised Anomaly Detection (Isolation Forest, Autoencoders)Medium⏱️ ~2 min

How Do Autoencoders Detect Anomalies?

Autoencoders detect anomalies by learning a compressed representation of normal data and then measuring reconstruction error at inference time. The architecture consists of an encoder that maps input to a lower dimensional bottleneck and a decoder that reconstructs the original input from the bottleneck. During training on clean data, the network learns to minimize reconstruction error, effectively learning the manifold of normal patterns. When presented with an anomaly at inference time, the autoencoder struggles to reconstruct it accurately because it lies off the learned manifold, producing high reconstruction error that flags it as suspicious. For tabular fraud detection, a typical architecture uses two to three hidden layers with a bottleneck size of 8 to 64 neurons. You train on recent periods you believe are mostly normal, using mean squared error or mean absolute error as the loss function. Add dropout or noise during training to improve generalization and prevent the model from memorizing specific examples. At inference, compute the reconstruction error for each input. This can be overall error or per feature error to identify which dimensions are most surprising. Threshold calibration is critical because reconstruction error distributions often have heavy tails. Many teams use quantile based thresholds per segment, flagging the top 0.5 to 1 percent of events. For time series and multivariate metrics, sequence autoencoders using Long Short Term Memory (LSTM) or Transformer layers capture temporal dependencies. You feed fixed length windows of recent values plus seasonal indicators like hour of day. The model learns correlations between metrics such as CPU, memory, and request latency. A spike in latency that is not explained by CPU or memory increase produces high reconstruction error. Companies monitoring infrastructure use this approach to detect cascading failures or resource exhaustion before alerts fire on individual thresholds. Inference latency is higher than Isolation Forest. A small tabular autoencoder runs in 1 to 3 milliseconds on CPU per event. Sequence autoencoders for multivariate metrics can process 1,000 to 5,000 windows per second per CPU socket, or much higher with batching on Graphics Processing Units (GPUs). Stripe and PayPal use autoencoders as a secondary detector that runs in parallel with Isolation Forest, with score fusion combining both signals. This dual approach balances coverage, with Isolation Forest catching sparse outliers and autoencoders catching structured anomalies that follow correlations but lie off the normal manifold.
💡 Key Takeaways
Learns compressed representation of normal data with encoder and decoder, flagging high reconstruction error as anomalies at inference time
Typical architecture uses 2 to 3 hidden layers with bottleneck size 8 to 64 for tabular data, trained on clean periods with dropout for generalization
Inference takes 1 to 3 milliseconds per event on CPU for tabular, 1,000 to 5,000 windows per second for sequence models on multivariate metrics
Captures nonlinear relationships and temporal correlations that Isolation Forest might miss, detecting structured anomalies off the learned manifold
Requires careful threshold calibration using quantiles per segment because reconstruction error distributions have heavy tails, typically flagging top 0.5 to 1 percent
📌 Examples
Stripe dual detector: Autoencoder runs in parallel with Isolation Forest, score fusion combines both signals for balanced coverage of sparse and structured anomalies
PayPal time series fraud: LSTM autoencoder on transaction velocity windows detects coordinated attacks with unusual temporal patterns in 2 to 5 milliseconds
Uber infrastructure monitoring: Autoencoder on CPU, memory, latency detects cascading failures when correlations break, processing 3,000 metric windows per second
Amazon network anomaly: Convolutional autoencoder on packet flow data flags distributed denial of service (DDoS) patterns with reconstruction error above 95th percentile
← Back to Unsupervised Anomaly Detection (Isolation Forest, Autoencoders) Overview
How Do Autoencoders Detect Anomalies? | Unsupervised Anomaly Detection (Isolation Forest, Autoencoders) - System Overflow