Online vs Offline Hard Negative Mining Architecture
OFFLINE HARD NEGATIVE MINING
Offline mining selects hard negatives before training begins. Process: embed all items using current model, query for nearest neighbors, label those that are known negatives as hard negatives. Store these pairs for training.
Advantage: can mine exhaustively across the entire corpus. Find the globally hardest negatives, not just those in a batch. Disadvantage: hard negatives become stale as model improves. What was hard at epoch 1 may be trivial at epoch 10.
Refresh strategy: re-mine hard negatives every N epochs (typically 1-5). Re-embed corpus with updated model, regenerate hard negative pairs. Adds computational overhead but keeps negatives challenging throughout training.
ONLINE HARD NEGATIVE MINING
Online mining selects hard negatives during training using the current batch. Compute embeddings for batch items, find hardest negatives within the batch for each anchor. No pre-computation needed.
In-batch negatives: Other positives in batch serve as negatives. Fast, no extra computation. Batch size limits negative diversity—512 batch = 511 candidates.
Hardest-in-batch: Select the negative with smallest distance to anchor. Can be too aggressive—may repeatedly select mislabeled positives.
Semi-hard negatives: Select negatives harder than positive but not the hardest. d(anchor, positive) < d(anchor, negative) < d(anchor, positive) + margin. Balances difficulty and stability.
CHOOSING BETWEEN THEM
Use offline mining when: you have a large corpus, can afford re-mining overhead, need globally hard negatives that batch may miss.
Use online mining when: corpus changes frequently, computational budget is limited, batch size is large enough for diversity (512+).