Privacy & Fairness in MLModel Interpretability (SHAP, LIME)Hard⏱️ ~3 min

Failure Modes and Edge Cases in Model Explanations

Explanation Instability

LIME explanations can change with small input perturbations. Change one feature by 0.1% and top features might reorder completely. This undermines trust: "Why did income matter more for my application but credit score for my neighbor?" SHAP is more stable but not immune. Solution: report confidence intervals. If "income importance: 0.3 ± 0.2," users understand uncertainty. Requires 3-5x explanation rounds.

Feature Correlation Problems

SHAP and LIME assume feature independence. If income and education are highly correlated (0.8), attribution between them becomes arbitrary. Removing one correlated feature is misleading because you implicitly change correlated ones. Detection: flag pairs above 0.7 correlation as unreliable for individual attribution. Mitigation: group correlated features and explain group importance instead.

Adversarial Explanations

Explanations can be manipulated. Attackers craft inputs producing misleading explanations while maintaining predictions. The model makes biased decisions but explanations hide bias by attributing to innocuous features. Detection: compare explanations for protected vs unprotected groups. If explanations differ dramatically while predictions are similar, investigate.

Out of Distribution Inputs

Explanations are unreliable outside training distribution. The model extrapolates unpredictably, and LIME/SHAP become meaningless. A model trained on K-K incomes produces nonsense for M. Detection: flag inputs far from training centroid. Warn or refuse to explain entirely.

⚠️ Key Trade-off: All methods have failure modes. Report uncertainty, flag unreliable scenarios, never treat explanations as ground truth.
💡 Key Takeaways
LIME explanations can reorder with tiny changes, undermining user trust
Correlated features (above 0.7) get arbitrary attribution, group them instead
Adversarial explanations can hide model bias by attributing to innocuous features
Out-of-distribution inputs produce meaningless explanations
Report confidence intervals and flag unreliable scenarios
📌 Interview Tips
1Mention confidence intervals: run 3-5 explanation rounds for uncertainty
2Correlation flag: features with r>0.7 should be grouped for attribution
← Back to Model Interpretability (SHAP, LIME) Overview