Failure Modes: Proxy Leakage and Feedback Loops

Proxy Variable Leakage
You removed race from features, but zip code predicts race with 85% accuracy. You removed gender, but first name predicts gender with 95% accuracy. Models find proxies. Detection: Train a classifier to predict protected attribute from model features. If AUC exceeds 0.7, significant proxy information exists. If it exceeds 0.9, features are nearly as informative as the protected attribute itself. Mitigation: Remove high-correlation proxies, but this may hurt accuracy. Adversarial debiasing actively penalizes models that leak protected attribute information. Feature importance analysis shows which features drive group differences.
Feedback Loops
Biased predictions create biased outcomes that become biased training data. If a hiring model rejects Group B candidates at higher rates, Group B has fewer success cases in future training data, making the model even more biased against Group B. Over 5 retraining cycles, initial 5% bias can amplify to 25%. Detection: Track fairness metrics over model versions. If disparity increases with each retrain, feedback loop is operating. Mitigation: Holdout exploration: reserve 5-10% of decisions for random assignment regardless of model prediction. Counterfactual logging: estimate what would have happened under different decisions.
Label Bias
Ground truth labels themselves may be biased. Performance reviews used as labels reflect reviewer bias. Fraud labels may be based on investigations that targeted certain groups. If you optimize for biased labels, you encode that bias into the model. Detection: Audit label generation process. If humans assigned labels, check inter-rater agreement by annotator demographics. Mitigation: Multiple annotators with diversity requirements. Calibration against objective outcomes where available. Accept that some bias may be unfixable without changing the labeling process entirely.
💡 Key Insight: These failure modes compound. Proxy leakage plus feedback loops plus label bias can create systems where bias is deeply embedded at every level, requiring comprehensive intervention.

💡 Key Takeaways

✓Proxy detection: train classifier predicting protected attribute from features, AUC above 0.7 indicates leakage

✓Feedback loops amplify bias: 5% initial bias can become 25% over 5 retraining cycles

✓Holdout exploration reserves 5-10% of decisions for random assignment to break loops

✓Label bias from biased annotators or investigation targeting requires process change

✓Failure modes compound: proxy + feedback + label bias embeds bias at every level

📌 Interview Tips

1Explain proxy detection: classifier predicting protected attribute from features

2Quantify feedback amplification: 5% to 25% over 5 retraining cycles

← Back to Bias Detection & Mitigation Overview