Privacy & Fairness in ML • Bias Detection & MitigationHard⏱️ ~3 min
Bias Mitigation: Pre, In, and Post Processing Techniques
Bias mitigation can occur at three stages of the ML pipeline, each with distinct tradeoffs in accuracy, complexity, and reversibility. The choice depends on legal constraints, infrastructure maturity, and acceptable accuracy costs. Production systems often combine multiple techniques, applying preprocessing for data quality, in processing for optimized accuracy fairness frontiers, and post processing for rapid adjustments without retraining.
Preprocessing operates on training data before model training. Techniques include reweighting underrepresented groups by 2x to 5x to balance representation, resampling to achieve statistical parity in labels, or learning fair representations that remove information about protected attributes while preserving predictive signal. Preprocessing is model agnostic and easy to audit, but can hurt model fit. Overcorrection through aggressive reweighting can explode variance, with individual sample weights above 10x often degrading test performance by 3 to 5 percentage points. Large platforms use capped reweighting, for example maximum 3x multiplier, combined with stratified sampling to ensure at least 10,000 examples per cohort.
In processing adds fairness constraints directly to the training objective. Adversarial debiasing trains a model alongside an adversary that predicts protected attributes from the learned representation. The encoder is updated to maximize task performance while minimizing adversary accuracy, effectively removing protected information. Lagrangian methods add a penalty term for fairness violations, for example penalizing TPR gaps exceeding 2 percentage points with a multiplier tuned via hyperparameter search. In processing achieves better Pareto frontiers, reducing accuracy by 1 to 2 percentage points to close a 10 percentage point fairness gap, compared to 3 to 5 percentage points with preprocessing alone. The cost is training complexity and non convex objectives that require careful tuning.
Post processing adjusts predictions or thresholds after training. Per group thresholds can satisfy equal opportunity by setting different score cutoffs for each demographic, for example 0.65 for Group A and 0.58 for Group B. This is fast to deploy and reversible, but can be illegal in jurisdictions prohibiting differential treatment, and breaks calibration since the same score means different things across groups. Randomized flipping adjusts a small fraction of predictions to meet constraints. Production systems use post processing for rapid response, for example adjusting thresholds within 24 hours when monitoring detects a fairness violation, while queueing a model retrain with in processing constraints for the next weekly release.
💡 Key Takeaways
•Preprocessing with capped reweighting: Maximum 3x sample weight prevents variance explosion while balancing representation, combined with stratified sampling for 10,000+ samples per group
•Adversarial debiasing: Train adversary to predict protected attributes from representation, encoder learns to fool adversary while maintaining task performance, removes proxy leakage
•Lagrangian constraints: Add penalty for TPR gaps above 2 percentage points, tune multiplier to target accuracy fairness tradeoff, achieves 1 to 2 percentage point accuracy cost for 10 point fairness gain
•Per group thresholds: Score 0.65 for Group A and 0.58 for Group B achieves equal opportunity but breaks calibration and may violate non discrimination laws
•Deployment speed: Post processing adjusts thresholds in under 24 hours for rapid response, in processing retrain takes 3 to 7 days in weekly release cycle
•Pareto frontier: In processing dominates preprocessing, closing same fairness gap with 50% less accuracy cost, but requires 2x to 3x longer hyperparameter tuning time
📌 Examples
Google uses stratified sampling with 10,000 minimum per cohort, capped 3x reweighting, and Lagrangian fairness penalty, achieving under 2 percentage point accuracy loss for 8 percentage point TPR gap reduction
Meta content moderation combines preprocessing to balance sensitive categories, adversarial debiasing to remove proxy features, and post processing for emergency threshold adjustments within hours
Microsoft credit model uses in processing constraints during weekly retrain, with post processing kill switch that reverts to safe per group thresholds if production metrics violate compliance for two consecutive days