Statistical Tests for Drift Detection
POPULATION STABILITY INDEX (PSI)
PSI measures drift by binning feature values and comparing bin frequencies between baseline and current distributions. Formula: PSI = Σ (current% - baseline%) × ln(current% / baseline%)
Interpretation: PSI < 0.1 indicates negligible drift. PSI 0.1-0.25 indicates moderate drift worth investigating. PSI > 0.25 indicates significant drift requiring action. These thresholds are industry conventions; calibrate for your domain.
Advantages: intuitive, interpretable, works for any distribution. Disadvantages: requires binning decisions, sensitive to bin boundaries, does not work well with sparse data.
KOLMOGOROV-SMIRNOV (K-S) TEST
K-S measures the maximum distance between cumulative distribution functions of two samples. Produces a test statistic D and p-value. P-value < 0.05 suggests statistically significant difference.
Advantages: no binning required, works for continuous distributions. Disadvantages: sensitive to sample size (large samples detect trivial differences), does not quantify drift magnitude well.
CHI-SQUARED TEST
For categorical features. Compares observed frequencies against expected frequencies. Produces chi-squared statistic and p-value. P-value < 0.05 indicates significant drift.
Advantages: standard statistical test, well understood. Disadvantages: requires sufficient samples per category, does not handle new categories (treats as zero expected frequency).
JENSEN-SHANNON DIVERGENCE
Symmetric measure of similarity between distributions. Bounded between 0 (identical) and 1 (completely different). More robust than KL divergence because it handles zero probabilities gracefully.
Use cases: comparing embedding distributions, multi-modal features. Interpretation thresholds vary by domain—establish baselines empirically.