Model Monitoring & Observability • Feature Importance Tracking (SHAP Drift)Medium⏱️ ~3 min
What is SHAP Drift and Why Track It?
SHAP (SHapley Additive exPlanations) drift monitoring tracks how feature contributions to model predictions change over time. Unlike traditional data drift detection that compares raw feature distributions, SHAP drift answers a critical question: which distribution shifts actually matter to your model? A feature distribution might shift dramatically but if the model ignores it, who cares? Conversely, a subtle shift in a high importance feature deserves immediate attention.
SHAP values come from cooperative game theory. For every prediction, each feature receives a contribution value (positive or negative) that sums to the difference between the prediction and a baseline. If your model predicts 0.7 probability and the baseline is 0.3, the SHAP values across all features will sum to 0.4. Aggregate these contributions across thousands of predictions and you get stable measures of global feature importance. Track these aggregates as time series and you surface what your model is actually relying on as data evolves.
In production ML systems, this matters enormously. Netflix might notice their recommendation model suddenly weights "time since last watch" 40% more than historical baseline while "genre preference" drops 25%. That signals a behavioral shift worth investigating. Uber's ETA model might show "current traffic conditions" SHAP values becoming bimodal, indicating a new subpopulation (perhaps a new city launch) appeared. These signals guide rollback decisions, retraining priorities, and incident triage far better than generic "feature X changed distribution" alerts.
💡 Key Takeaways
•SHAP values attribute each prediction to feature contributions that sum to prediction minus baseline, grounded in Shapley values from game theory
•Traditional drift detection flags distribution changes; SHAP drift reveals which changes the model actually uses in decision making
•Mean absolute SHAP per feature aggregated over windows creates stable time series for monitoring model reliance shifts
•Netflix and Uber use SHAP drift to distinguish actionable distribution shifts from benign noise in high throughput ranking and prediction services
•Bimodal SHAP distributions for a feature indicate new subpopulations appeared, providing early signal during incidents or launches
•Cost is higher than simple statistical tests like Kolmogorov Smirnov (2 to 7 milliseconds per sample for TreeSHAP) but provides model aware signals worth the investment
📌 Examples
Uber ETA prediction model shows "current traffic" SHAP values split into two peaks (bimodal), revealing new city with different traffic patterns launched
Airbnb pricing model mean absolute SHAP for "days until booking" jumps from 0.12 to 0.19 log odds over 48 hours, triggering investigation that finds holiday booking surge
Meta content ranking sees "recency" feature SHAP increase 40% while "engagement history" drops 25%, indicating shift toward trending content consumption