Privacy & Fairness in MLFairness Metrics (Demographic Parity, Equalized Odds)Hard⏱️ ~3 min

Post Processing Threshold Optimization for Fairness

Post processing threshold optimization adjusts decision thresholds per group after model training to satisfy fairness constraints with minimal utility loss. The approach is fast to deploy, low risk to the training pipeline, and model agnostic. The Hardt algorithm for equalized odds learns group specific thresholds and optional randomization probabilities by finding the operating points on each group's Receiver Operating Characteristic (ROC) curve that equalize True Positive Rate (TPR) and False Positive Rate (FPR) while maximizing overall accuracy or business utility. The algorithm sweeps thresholds for each group, computes TPR and FPR at each threshold, and solves a linear program to find threshold combinations that satisfy fairness constraints. For equalized odds, the optimal solution often lies at the intersection of group ROC curves. In practice, the randomized decision region is narrow. You can often approximate the solution with deterministic per group thresholds, avoiding the complexity of probabilistic decisions. Computation takes under 5 minutes for 100 cohorts and 1 million samples on a single machine. The tradeoff is that post processing can break ranking consistency and calibration. If you use model scores for ranking (credit risk, fraud risk), applying different thresholds per group means individuals with identical scores receive different decisions based on group membership. This can be legally prohibited and is difficult to explain. Calibration also breaks: a score of 0.6 no longer means 60% probability of default across all groups. If downstream systems rely on calibrated probabilities, post processing creates inconsistencies. Alternatives include in processing constraints (adding fairness terms to the training loss) and pre processing reweighting (adjusting sample weights by group). In processing can recover more accuracy because the model learns fairness aware representations, but it requires modifying the training pipeline and extensive tuning. Pre processing is model agnostic like post processing, but it can amplify noise in minority groups. Production teams at Google, Microsoft, and Amazon typically start with post processing for speed and safety, then invest in in processing if accuracy loss exceeds 5 percentage points of Area Under the Curve (AUC) or business metrics.
💡 Key Takeaways
Post processing adjusts decision thresholds per group after training to meet fairness constraints. Hardt algorithm finds thresholds that equalize TPR and FPR with minimal accuracy loss
Model agnostic and fast: Works with any trained model, computes thresholds in under 5 minutes for 100 cohorts and 1 million samples. Low risk to deploy compared to retraining
Breaks calibration and ranking consistency. A score of 0.6 no longer means 60% risk across groups. Individuals with identical scores get different decisions based on group
Alternative approaches: In processing (fairness loss term during training) recovers more accuracy but requires pipeline changes. Pre processing (reweighting) is fast but amplifies noise
Production teams start with post processing for speed, invest in in processing if accuracy loss exceeds 5 percentage points AUC or business metrics degrade significantly
Legal and regulatory constraints may prohibit per group thresholds even if technically feasible. Always obtain legal approval before deploying group specific decision rules
📌 Examples
Microsoft credit model: Post processing finds per gender thresholds that achieve demographic parity ratio of 0.90 with 2 percentage point AUC drop, deployed in 1 week
Amazon hiring classifier: Hardt algorithm achieves equal opportunity with TPR gap under 0.03 across race groups, but breaks score calibration. Downstream risk models retrained
Google fraud detection: In processing approach with fairness weighted loss recovers 3 percentage points AUC versus post processing, but requires 2 months to retrain and validate
← Back to Fairness Metrics (Demographic Parity, Equalized Odds) Overview