Statistical Tests for Drift Detection

POPULATION STABILITY INDEX (PSI)
PSI measures drift by binning feature values and comparing bin frequencies between baseline and current distributions. Formula: PSI = Σ (current% - baseline%) × ln(current% / baseline%)
Interpretation: PSI < 0.1 indicates negligible drift. PSI 0.1-0.25 indicates moderate drift worth investigating. PSI > 0.25 indicates significant drift requiring action. These thresholds are industry conventions; calibrate for your domain.
Advantages: intuitive, interpretable, works for any distribution. Disadvantages: requires binning decisions, sensitive to bin boundaries, does not work well with sparse data.
KOLMOGOROV-SMIRNOV (K-S) TEST
K-S measures the maximum distance between cumulative distribution functions of two samples. Produces a test statistic D and p-value. P-value < 0.05 suggests statistically significant difference.
Advantages: no binning required, works for continuous distributions. Disadvantages: sensitive to sample size (large samples detect trivial differences), does not quantify drift magnitude well.
CHI-SQUARED TEST
For categorical features. Compares observed frequencies against expected frequencies. Produces chi-squared statistic and p-value. P-value < 0.05 indicates significant drift.
Advantages: standard statistical test, well understood. Disadvantages: requires sufficient samples per category, does not handle new categories (treats as zero expected frequency).
JENSEN-SHANNON DIVERGENCE
Symmetric measure of similarity between distributions. Bounded between 0 (identical) and 1 (completely different). More robust than KL divergence because it handles zero probabilities gracefully.
Use cases: comparing embedding distributions, multi-modal features. Interpretation thresholds vary by domain—establish baselines empirically.
When To Use: PSI for quick interpretation with business stakeholders. K-S for continuous features when you need statistical significance. Chi-squared for categorical features. JS divergence for embedding-like features.

💡 Key Takeaways

✓PSI: <0.1 negligible, 0.1-0.25 moderate, >0.25 significant; requires binning, intuitive interpretation

✓K-S test: measures max CDF distance, no binning needed; sensitive to sample size, p<0.05 threshold

✓Chi-squared for categorical, JS divergence for embeddings; each test has domain-specific tradeoffs

📌 Interview Tips

1Interview Tip: Walk through PSI calculation steps and explain threshold interpretation.

2Interview Tip: Know when to use each test—PSI for business communication, K-S for continuous, chi-squared for categorical.

← Back to Data Drift Detection Overview