Embeddings & Similarity SearchHard Negative Mining (Triplet Loss, Contrastive Learning)Medium⏱️ ~3 min

Triplet Loss and Contrastive Loss Formulations

TRIPLET LOSS

Triplet loss trains embeddings using three items: an anchor, a positive (similar to anchor), and a negative (different from anchor). The loss pushes anchor closer to positive and farther from negative.

Formula: loss = max(0, d(anchor, positive) - d(anchor, negative) + margin). The margin (typically 0.2-1.0) specifies how much farther the negative should be than the positive. If the constraint is already satisfied, loss is zero.

Training dynamic: as the model improves, easy triplets contribute zero loss. Only hard triplets (where negative is close to anchor) provide gradient signal. This motivates hard negative mining—without it, most triplets become uninformative.

CONTRASTIVE LOSS (INFONCE)

Contrastive loss (InfoNCE) compares one positive against many negatives simultaneously. For each anchor, compute similarity to the positive and to all negatives in the batch. The loss is softmax cross-entropy: the positive should have the highest similarity.

Formula: loss = -log(exp(sim(a,p)/τ) / Σexp(sim(a,n)/τ)) where τ is temperature. Lower temperature (0.05-0.1) makes the distribution sharper, pushing harder on near-negatives.

Advantage over triplet loss: uses all negatives in batch, not just one. More negative comparisons per forward pass. Batch size of 512 means 511 negatives per anchor.

CHOOSING BETWEEN THEM

Triplet loss: Simple, works with any batch size. Better when you have curated hard negatives. Slower convergence (only 1 negative per triplet).

Contrastive loss: Requires large batches (512+) for sufficient negatives. Better when you can afford large batch training. Faster convergence via multi-negative comparison.

💡 Key Insight: Temperature τ in contrastive loss controls hardness. Lower temperature focuses learning on hardest negatives. Start at 0.07-0.1 and tune based on convergence.
💡 Key Takeaways
Triplet loss: anchor-positive-negative with margin; uses 1 negative per sample
Contrastive loss: 1 positive vs many negatives via softmax; uses batch_size-1 negatives
Temperature in contrastive loss controls hardness—lower = sharper focus on hard negatives
📌 Interview Tips
1Interview Tip: Compare the two losses—triplet uses curated negatives, contrastive leverages batch for multi-negative learning.
2Interview Tip: Explain temperature tuning—lower temperature focuses on hardest negatives but can destabilize training.
← Back to Hard Negative Mining (Triplet Loss, Contrastive Learning) Overview