What is Bias in Machine Learning Systems?
Sources of Bias
Historical bias: Training data reflects past discrimination. If 80% of hires were male, the model learns maleness predicts success. Representation bias: Underrepresented groups. A facial system trained on 90% light skin fails on darker tones. Measurement bias: Features correlate with protected attributes. Credit scores correlate with race due to historical lending. Aggregation bias: A single model for diverse populations learns majority patterns, failing minorities.
Why Bias Matters Beyond Ethics
Biased models create business and legal risks. Loan models with racial bias face regulatory action costing hundreds of millions. Hiring tools have resulted in settlements exceeding M. Biased recommendations lose minority users permanently. Bias also indicates model weakness: 95% accuracy on Group A but 70% on Group B is an engineering problem masquerading as ethics.
The Accuracy-Fairness Trade-off
Optimizing for raw accuracy often amplifies bias. If Group A has more training data, the model performs better on Group A, raising overall accuracy while Group B suffers. Fairness constraints typically cost 2-5% accuracy. This trade-off is not always acceptable: in medical diagnosis, 2% loss might mean missed cancers.