Privacy & Fairness in ML • Fairness Metrics (Demographic Parity, Equalized Odds)Easy⏱️ ~2 min
What is Demographic Parity?
Demographic parity is a group fairness constraint that requires a model to predict positive outcomes at equal rates across different sensitive groups, regardless of ground truth labels. The metric compares only model outputs: if 60% of all loan applications are approved, then each demographic group (defined by gender, race, age, etc.) should also see approximately 60% approval.
The metric is measured as either a difference or ratio. Perfect parity difference equals 0, meaning P(Ŷ = 1 | A = a) minus P(Ŷ = 1 | A = b) equals zero for any groups a and b. Perfect parity ratio equals 1.0, calculated as the minimum selection rate divided by the maximum. The 80 percent rule, used by regulators in hiring and lending, sets a threshold of 0.8 for the ratio.
Demographic parity targets allocation fairness when distributing limited resources like interview slots, content impressions, or loan opportunities. Because it ignores labels entirely, enforcing it can force higher false positive rates in groups with naturally lower qualification rates. In production credit systems processing 5,000 applications per hour, teams at Amazon SageMaker Clarify routinely monitor selection rate ratios in real time, triggering alerts if the ratio drops below 0.8 for 15 minutes.
The tradeoff is accuracy versus equal representation. Enforcing strict demographic parity can reduce model accuracy by 2 to 10 percentage points when score distributions differ significantly between groups. The metric works well when you need equal access or exposure but may conflict with merit based decisions where base rates legitimately differ.
💡 Key Takeaways
•Demographic parity compares selection rates across groups without considering ground truth labels, targeting equal allocation of positive predictions
•Measured as either difference (target 0) or ratio (target 1.0), with the 80 percent rule threshold of 0.8 commonly used in regulated industries like hiring
•Real time monitoring is feasible because it requires only predictions, not delayed labels. Amazon monitors sliding windows of 50,000 decisions with 1 minute updates
•Enforcing parity can reduce accuracy by 2 to 10 percentage points when groups have different score distributions or base rates
•Works best for allocation fairness scenarios like distributing interview slots, content exposure, or loan opportunities where equal access matters
•Fails when merit based selection is required and base rates legitimately differ, forcing artificial inflation of false positives in lower scoring groups
📌 Examples
Google hiring: If 20% of all applicants advance to phone screen, each demographic group should see approximately 20% advancement rate
Meta content distribution: A recommender showing ads to 1 million users should target similar impression rates across age groups, monitored in 5 minute windows
Microsoft credit scoring: Processing 5,000 loan applications per hour, system alerts if gender based selection rate ratio drops below 0.8 for 15 minutes