Privacy & Fairness in ML • Model Interpretability (SHAP, LIME)Medium⏱️ ~3 min
SHAP vs LIME vs Gradient Methods: Choosing the Right Technique
The choice between SHAP, LIME, and gradient based methods involves trade-offs in computational cost, stability, theoretical guarantees, and model compatibility. SHAP provides the strongest theoretical foundation with axioms like local accuracy (attributions sum exactly to the prediction delta from baseline) and consistency (a feature that helps more never gets lower attribution). For tree ensembles, optimized algorithms compute attributions in 2 to 5 milliseconds, making synchronous serving feasible. The cost is higher for neural networks or high dimensional sparse features, where exact Shapley computation can take seconds.
LIME is faster to prototype and intuitive for local debugging. It works well when you need quick iteration to understand a specific misclassification. However, LIME is unstable across runs because it depends on random perturbations, kernel width, and neighborhood size. Two consecutive runs on the same instance can produce different top features. This variability makes LIME unsuitable for regulatory contexts requiring reproducible explanations or user facing adverse action notices.
Gradient based methods like Integrated Gradients or DeepLIFT are efficient for differentiable models, computing attributions in 10 to 30 milliseconds on a T4 class GPU. They scale well to deep learning but assume smooth gradients and may not handle categorical preprocessing (one hot encoding, embeddings) gracefully. Permutation importance offers global insights and is simple to compute offline (shuffle a feature and measure accuracy drop), but cannot provide per instance narratives and can be misleading when features are correlated.
In practice, teams use hybrid strategies. Online serving uses fast methods (SHAP for trees, gradient methods for neural networks) to return top K features within 15 to 20 milliseconds. Offline analysis uses richer techniques including LIME for debugging, permutation importance for global rankings, and counterfactual search for actionable insights. Regulatory and product needs drive the final choice: adverse action notices require stability and reproducibility (favor SHAP), model debugging values iteration speed (LIME acceptable), and user facing recommendations may prefer counterfactuals (change income by $5K to improve score by 10 points).
💡 Key Takeaways
•SHAP offers strongest theoretical guarantees (local accuracy, consistency) and 2 to 5 millisecond latency for tree ensembles, enabling synchronous serving for compliance and adverse action use cases.
•LIME completes in 0.5 to 2 seconds due to hundreds of model evaluations, suitable only for offline debugging or asynchronous APIs, with instability across runs due to random perturbations.
•Gradient based methods (Integrated Gradients, DeepLIFT) compute attributions in 10 to 30 milliseconds on GPU for neural networks but require differentiability and may struggle with categorical features.
•Permutation importance provides global feature rankings offline but cannot generate per instance explanations and can be misleading when features are correlated (credit from one feature shifts to another).
•Hybrid strategies are common: use SHAP or gradients online for top K features within 15 to 20 milliseconds, use LIME offline for debugging, and use counterfactuals for user facing actionable insights.
•Regulatory needs favor SHAP for reproducibility and stability, while model debugging favors LIME for fast iteration, and product features may use counterfactuals (increase income by $5K to improve score by 10 points).
📌 Examples
Google Cloud uses SHAP for tabular models in AutoML Tables and gradient methods (Integrated Gradients) for neural networks in Vertex AI, with method selection automated based on model type.
Microsoft Azure ML interpretability dashboard combines SHAP local attributions with permutation based global importance and fairness metrics, allowing drill down from global to instance level.
A fraud detection team uses SHAP in production for real time explanations (2 milliseconds per prediction) and runs LIME overnight on flagged cases to debug false positives with richer local analysis.
Meta's open source Captum library provides Integrated Gradients and layer wise attribution for PyTorch models, integrated into internal evaluation pipelines with 10 to 20 millisecond latency on V100 GPUs.