Fairness Metrics: Group, Individual, and Calibration Parity
Group Fairness Metrics
Group fairness asks: do different demographic groups receive similar treatment in aggregate? Demographic parity: Positive prediction rates should be equal across groups. If 40% of Group A gets approved, 40% of Group B should too. Equalized odds: True positive rate and false positive rate should be equal across groups. If 90% of qualified Group A members get approved, 90% of qualified Group B members should too. Equal opportunity: A relaxed version requiring only equal true positive rates. These metrics treat groups as monolithic, ignoring individual variation.
Individual Fairness Metrics
Individual fairness asks: are similar individuals treated similarly? Two applicants with identical qualifications should receive identical predictions regardless of group membership. The challenge: defining similarity. What features determine similarity? If zip code is included, and zip codes correlate with race, you embed bias in your similarity definition. Counterfactual fairness: would the prediction change if only the protected attribute changed? For the same person, changing their gender should not change the prediction. Implementation requires causal models to identify which features depend on protected attributes.
Calibration Parity
Calibration asks: when the model predicts 80% probability, does the event occur 80% of the time for all groups? A well-calibrated model saying "80% chance of loan default" should see 80% default rate for both men and women. Miscalibration is common: models often overpredict risk for minority groups (predicting 80% default when actual rate is 50%). Recalibration fits separate probability mappings per group, adjusting predictions post-hoc. Check calibration with reliability diagrams plotting predicted versus actual probabilities per group.