Learn→Embeddings & Similarity Search→Hard Negative Mining (Triplet Loss, Contrastive Learning)→4 of 5

Embeddings & Similarity Search • Hard Negative Mining (Triplet Loss, Contrastive Learning)Hard⏱️ ~3 min

Failure Modes: False Negatives and Label Noise

FALSE NEGATIVES
The biggest risk in hard negative mining: treating actual positives as negatives. If item B is truly relevant to query A but unlabeled, the model learns to push B away. This directly harms recall.
Symptoms: recall drops after adding hard negative mining. Model confidently ranks some relevant items at the bottom. Users complain about obvious matches not appearing.
Causes: incomplete labeling (most relevant pairs are not explicitly labeled), label noise (human annotators disagree or make errors), distribution shift (new items have no labels).
Detection: sample hard negatives and manually review. If >5% are actually relevant, your false negative rate is too high. Also monitor recall on a clean held-out test set—if it drops after mining, investigate.
MITIGATION STRATEGIES
Confidence filtering: Only use negatives where the model is confident (high distance from anchor). Avoid the hardest negatives which are likely false negatives.
Cross-validation: Train multiple models, only use negatives that all models agree are negative. Ensemble agreement reduces single-model errors.
Soft labels: Instead of binary negative, use a continuous label based on distance. Very close items get weak negative signal; distant items get strong signal.
LABEL NOISE AMPLIFICATION
Hard negative mining amplifies label noise. If 5% of negatives are mislabeled, random sampling sees 5% noise. But mining specifically selects the hardest examples—which are disproportionately mislabeled (they are hard because they are actually positives). Noise rate in mined set can be 20-30%.
❗ Critical: Monitor recall continuously when using hard negative mining. If recall drops after adding mining, your false negative rate is too high. Use confidence filtering or softer negative selection.

💡 Key Takeaways

✓False negatives: treating actual positives as negatives directly harms recall

✓Hard mining amplifies label noise: 5% random noise → 20-30% in mined set

✓Mitigation: confidence filtering, ensemble agreement, soft labels

📌 Interview Tips

1Interview Tip: Explain why hard negatives amplify noise—mining selects hardest, which are often mislabeled positives.

2Interview Tip: Describe confidence filtering—avoid hardest negatives where model is uncertain.

← Back to Hard Negative Mining (Triplet Loss, Contrastive Learning) Overview