Training Two-Tower Models
The Training Signal
For each positive pair (user U clicked item I), you need negatives: items U did not click. With 10 million items, you have 10 million potential negatives per positive. You cannot use all of them. Typical batch sizes sample 100-1000 negatives per positive.
The loss function says: score(U, I_positive) should be higher than score(U, I_negative) for all negatives. Softmax cross-entropy is common: compute softmax over the positive score and all negative scores, then minimize negative log-likelihood of the positive. This pushes positive scores up and negative scores down simultaneously.
In Batch Negatives
The simplest approach: within a batch of 512 user-item pairs, use the 511 other items as negatives for each user. You already computed their embeddings, so this adds zero extra computation. For user U with positive item I, the 511 items from other users become negatives.
The problem: batch negatives are random samples from the interaction distribution. They skew toward popular items and may be too easy. If the model just learns "user U does not want item I because I is a completely different category", it learns nothing useful. You need harder negatives that force the model to make fine distinctions.
Hard Negative Mining
Hard negatives are items similar to the positive that the user did not interact with. If user U clicked a blue Nike running shoe, a hard negative is a blue Adidas running shoe they saw but did not click. The model must learn why U preferred Nike over Adidas, not just "U likes shoes over laptops".
To find hard negatives: after initial training, run the model to find items with high scores that lack positive interactions. These are items the model thinks the user would like but they did not engage with. Mine these as negatives and retrain. This iterative process produces increasingly discriminative models. Two to three rounds of hard negative mining typically improve retrieval recall by 5-15%.