How Do Autoencoders Detect Anomalies?
Core Mechanism
Autoencoders detect anomalies by learning to reconstruct normal data patterns. The network compresses input through an encoder bottleneck, then reconstructs it via a decoder. The key insight: autoencoders trained on normal data struggle to reconstruct anomalous patterns, producing high reconstruction errors that serve as anomaly scores.
Detection Principle: Normal instances produce low reconstruction error because the autoencoder learned their patterns. Anomalies produce high error because the compressed representation cannot capture unfamiliar patterns.
Architecture Design
A typical autoencoder progressively compresses input dimensionality (e.g., 100 → 50 → 20 → 10 latent dimensions), with a symmetric decoder reconstructing original dimensions. The bottleneck size critically affects sensitivity: too large allows anomaly memorization, too small loses normal pattern fidelity.
For tabular fraud data, dense layers with ReLU work well. For sequential transaction data, LSTM-based autoencoders capture temporal dependencies. Variational autoencoders (VAEs) add probabilistic modeling for uncertainty quantification.
Error Metrics and Normalization
Mean Squared Error (MSE) between input and reconstruction is standard. Feature-weighted MSE assigns higher importance to fraud-indicative features. Rather than raw error, normalize scores using training set distribution to convert to z-scores or percentiles for stable threshold selection.
Training Tip: Train exclusively on verified normal data. Contamination with anomalies teaches the model to reconstruct fraud patterns, dramatically reducing detection sensitivity. Use holdout validation with known anomalies to tune thresholds.
Regularization and Overfitting
Early stopping prevents overfitting. Dropout and L2 penalties improve generalization. If the autoencoder perfectly reconstructs training data but shows high validation error, the bottleneck is too large—reduce latent dimensions.