Production Implementation: Metrics, Monitoring, and Serving Impact
TRAINING PIPELINE
Production hard negative mining fits into the training pipeline as a data preprocessing or augmentation step. Typical flow: train initial model with random negatives, deploy for mining, re-train with mined negatives, iterate.
Initial training: Train baseline model with random negatives for 3-5 epochs. This gives a reasonable embedding space for mining.
Mining phase: Embed corpus with current model. For each anchor, query top-K neighbors (K=100-1000). Filter to items that are known negatives. Store as hard negative pairs.
Retraining: Mix mined hard negatives with random negatives (ratio 1:1 to 1:3). Pure hard negatives can destabilize training. Retrain for 2-3 epochs.
MONITORING METRICS
Hard negative hit rate: What fraction of mined negatives are actually used in training (not filtered out by confidence)? Very low hit rate (<10%) suggests mining thresholds are too aggressive.
Loss dynamics: Hard negatives should produce higher loss than random negatives initially, then converge. If loss stays high, negatives may be too hard (including false negatives).
Recall trend: Recall should improve or stay flat after mining. If recall drops, investigate false negative rate immediately.
SERVING IMPACT
Hard negative training changes the embedding space—items that looked similar now look different. This affects:
Index freshness: After retraining, old embeddings are stale. Re-embed corpus before serving. Plan for reindexing latency.
User experience: If hard negatives were false negatives, previously good results may disappear. A/B test before full rollout. Monitor user engagement signals.