Model Monitoring & ObservabilityData Drift DetectionMedium⏱️ ~3 min

Statistical Tests for Drift Detection

POPULATION STABILITY INDEX (PSI)

PSI measures drift by binning feature values and comparing bin frequencies between baseline and current distributions. Formula: PSI = Σ (current% - baseline%) × ln(current% / baseline%)

Interpretation: PSI < 0.1 indicates negligible drift. PSI 0.1-0.25 indicates moderate drift worth investigating. PSI > 0.25 indicates significant drift requiring action. These thresholds are industry conventions; calibrate for your domain.

Advantages: intuitive, interpretable, works for any distribution. Disadvantages: requires binning decisions, sensitive to bin boundaries, does not work well with sparse data.

KOLMOGOROV-SMIRNOV (K-S) TEST

K-S measures the maximum distance between cumulative distribution functions of two samples. Produces a test statistic D and p-value. P-value < 0.05 suggests statistically significant difference.

Advantages: no binning required, works for continuous distributions. Disadvantages: sensitive to sample size (large samples detect trivial differences), does not quantify drift magnitude well.

CHI-SQUARED TEST

For categorical features. Compares observed frequencies against expected frequencies. Produces chi-squared statistic and p-value. P-value < 0.05 indicates significant drift.

Advantages: standard statistical test, well understood. Disadvantages: requires sufficient samples per category, does not handle new categories (treats as zero expected frequency).

JENSEN-SHANNON DIVERGENCE

Symmetric measure of similarity between distributions. Bounded between 0 (identical) and 1 (completely different). More robust than KL divergence because it handles zero probabilities gracefully.

Use cases: comparing embedding distributions, multi-modal features. Interpretation thresholds vary by domain—establish baselines empirically.

When To Use: PSI for quick interpretation with business stakeholders. K-S for continuous features when you need statistical significance. Chi-squared for categorical features. JS divergence for embedding-like features.
💡 Key Takeaways
PSI: <0.1 negligible, 0.1-0.25 moderate, >0.25 significant; requires binning, intuitive interpretation
K-S test: measures max CDF distance, no binning needed; sensitive to sample size, p<0.05 threshold
Chi-squared for categorical, JS divergence for embeddings; each test has domain-specific tradeoffs
📌 Interview Tips
1Interview Tip: Walk through PSI calculation steps and explain threshold interpretation.
2Interview Tip: Know when to use each test—PSI for business communication, K-S for continuous, chi-squared for categorical.
← Back to Data Drift Detection Overview