Model Monitoring & ObservabilityFeature Importance Tracking (SHAP Drift)Hard⏱️ ~3 min

Production SHAP Drift Pipeline Architecture and Capacity Planning

PIPELINE ARCHITECTURE

A production SHAP drift pipeline has three components: sampling, computation, and comparison.

Sampling: Select representative predictions for analysis. Stratified sampling by segment (user type, geography) ensures you do not miss segment-specific drift. Sample size: 1000-10000 predictions per window depending on compute budget.

Computation: Run SHAP explainer on sampled predictions. For tree models, use TreeExplainer (fast, exact). For other models, use KernelSHAP or DeepSHAP (slower, approximate). Parallelize across multiple workers for scale.

Comparison: Aggregate SHAP values per feature. Compare to baseline using statistical tests or simple thresholds. Baseline: training set SHAP distribution or recent historical window.

CAPACITY PLANNING

SHAP computation dominates cost. Tree model with 100 features and 1000 samples: ~10 seconds. Neural network with Kernel SHAP: ~10 minutes. Plan compute resources accordingly.

Scaling strategies: Batch processing during low-traffic hours. Dedicated compute cluster for SHAP jobs. Caching SHAP values for frequently-seen input patterns (if applicable).

Cost optimization: start with weekly SHAP drift monitoring. Increase frequency only for critical models. Use data-centric approximations for less critical models.

STORAGE AND VISUALIZATION

Store aggregated SHAP statistics: mean, std, percentiles per feature per time window. Raw SHAP values for individual predictions are expensive to store; keep only for debugging samples.

Dashboard essentials: feature importance ranking over time (line chart), drift alerts history, comparison of current vs baseline importance distribution.

✅ Best Practice: Start simple: weekly SHAP on 1000 samples. Add complexity (higher frequency, larger samples) only when you have evidence that simple monitoring misses important drift.
💡 Key Takeaways
Pipeline components: stratified sampling (1K-10K), computation (TreeExplainer fast, KernelSHAP slow), baseline comparison
Cost: tree model SHAP ~10 seconds/1K samples; neural network ~10 minutes; batch during low-traffic hours
Start weekly with 1000 samples; increase frequency only for critical models where simple monitoring misses drift
📌 Interview Tips
1Interview Tip: Describe the three-component pipeline: sampling, computation, comparison.
2Interview Tip: Explain capacity planning: TreeExplainer vs KernelSHAP performance, scaling strategies.
← Back to Feature Importance Tracking (SHAP Drift) Overview