Triplet Loss and Contrastive Loss Formulations
TRIPLET LOSS
Triplet loss trains embeddings using three items: an anchor, a positive (similar to anchor), and a negative (different from anchor). The loss pushes anchor closer to positive and farther from negative.
Formula: loss = max(0, d(anchor, positive) - d(anchor, negative) + margin). The margin (typically 0.2-1.0) specifies how much farther the negative should be than the positive. If the constraint is already satisfied, loss is zero.
Training dynamic: as the model improves, easy triplets contribute zero loss. Only hard triplets (where negative is close to anchor) provide gradient signal. This motivates hard negative mining—without it, most triplets become uninformative.
CONTRASTIVE LOSS (INFONCE)
Contrastive loss (InfoNCE) compares one positive against many negatives simultaneously. For each anchor, compute similarity to the positive and to all negatives in the batch. The loss is softmax cross-entropy: the positive should have the highest similarity.
Formula: loss = -log(exp(sim(a,p)/τ) / Σexp(sim(a,n)/τ)) where τ is temperature. Lower temperature (0.05-0.1) makes the distribution sharper, pushing harder on near-negatives.
Advantage over triplet loss: uses all negatives in batch, not just one. More negative comparisons per forward pass. Batch size of 512 means 511 negatives per anchor.
CHOOSING BETWEEN THEM
Triplet loss: Simple, works with any batch size. Better when you have curated hard negatives. Slower convergence (only 1 negative per triplet).
Contrastive loss: Requires large batches (512+) for sufficient negatives. Better when you can afford large batch training. Faster convergence via multi-negative comparison.