Failure Modes and Edge Cases in Model Explanations

Explanation Instability
LIME explanations can change with small input perturbations. Change one feature by 0.1% and top features might reorder completely. This undermines trust: "Why did income matter more for my application but credit score for my neighbor?" SHAP is more stable but not immune. Solution: report confidence intervals. If "income importance: 0.3 ± 0.2," users understand uncertainty. Requires 3-5x explanation rounds.
Feature Correlation Problems
SHAP and LIME assume feature independence. If income and education are highly correlated (0.8), attribution between them becomes arbitrary. Removing one correlated feature is misleading because you implicitly change correlated ones. Detection: flag pairs above 0.7 correlation as unreliable for individual attribution. Mitigation: group correlated features and explain group importance instead.
Adversarial Explanations
Explanations can be manipulated. Attackers craft inputs producing misleading explanations while maintaining predictions. The model makes biased decisions but explanations hide bias by attributing to innocuous features. Detection: compare explanations for protected vs unprotected groups. If explanations differ dramatically while predictions are similar, investigate.
Out of Distribution Inputs
Explanations are unreliable outside training distribution. The model extrapolates unpredictably, and LIME/SHAP become meaningless. A model trained on K-K incomes produces nonsense for M. Detection: flag inputs far from training centroid. Warn or refuse to explain entirely.
⚠️ Key Trade-off: All methods have failure modes. Report uncertainty, flag unreliable scenarios, never treat explanations as ground truth.

💡 Key Takeaways

✓LIME explanations can reorder with tiny changes, undermining user trust

✓Correlated features (above 0.7) get arbitrary attribution, group them instead

✓Adversarial explanations can hide model bias by attributing to innocuous features

✓Out-of-distribution inputs produce meaningless explanations

✓Report confidence intervals and flag unreliable scenarios

📌 Interview Tips

1Mention confidence intervals: run 3-5 explanation rounds for uncertainty

2Correlation flag: features with r>0.7 should be grouped for attribution

← Back to Model Interpretability (SHAP, LIME) Overview