ML-Powered Search & RankingLearning to Rank (Pointwise/Pairwise/Listwise)Medium⏱️ ~2 min

Pairwise Ranking: Learning Relative Order From Item Comparisons

Pairwise learning to rank reduces the ranking problem to learning preferences between pairs of items within the same query. Instead of predicting absolute relevance, the model learns which item should be ranked higher given a pair. The core idea is to minimize ranking inversions: cases where a less relevant item is scored higher than a more relevant one. This aligns training more closely with the goal of producing a well ordered list. During training, for each query the system constructs pairs of items where one is more relevant than the other, based on labels such as clicks versus no clicks, or higher versus lower relevance grades. The loss function operates on the difference of scores. A common choice is the logistic loss on score differences: loss equals log of 1 plus exponential of score for the lower item minus score for the higher item. If the model correctly orders the pair with a sufficient margin, the loss is near zero. If the order is wrong, the loss is high and gradients push the model to widen the score gap. The challenge is pair explosion. A query with 100 candidates has 4,950 possible pairs, and most queries in production have hundreds of candidates. Naive enumeration wastes computation on pairs deep in the list that do not affect user experience. The solution is intelligent pair sampling. One effective strategy is to sample each clicked item with several unclicked items, and weight pairs by the change in NDCG if their order were swapped. This focuses training effort on pairs that matter for top k quality. For example, a pair comparing position 2 versus position 5 receives much higher weight than position 80 versus position 85. LambdaRank extends this idea by using gradients weighted by metric deltas, effectively prioritizing pairs that impact NDCG most. In practice, pairwise methods offer a strong balance of ranking quality and training complexity. LinkedIn feed ranking uses a pairwise approach with sampled pairs to rank posts, training on implicit feedback where dwell time above 3 seconds indicates preference over skipped posts. The model uses gradient boosted trees with 400 features and trains on 10 pairs per query per epoch, limiting training cost while achieving 0.76 NDCG@10, a 3 percent improvement over a pointwise baseline. Serving latency is similar to pointwise at 55 microseconds per item because inference still scores items independently; the pairwise structure only affects training. The main trade-off is training cost and complexity: pairwise requires careful pair sampling and often takes 2 to 4 times longer to train than pointwise, but it consistently delivers better top k ranking quality when relative preferences are more reliable than absolute labels.
💡 Key Takeaways
Pairwise methods learn relative preferences by minimizing ranking inversions between item pairs, using loss functions like logistic loss on score differences to enforce correct ordering.
Pair explosion is a major challenge: a query with 100 candidates yields 4,950 pairs, requiring intelligent sampling such as pairing each clicked item with 5 to 10 unclicked items and weighting by NDCG delta.
LambdaRank weights pair gradients by the change in NDCG if the pair order swapped, focusing training on pairs in top positions where user attention concentrates.
Pairwise training typically takes 2 to 4 times longer than pointwise due to pair construction and loss computation, but serving latency is identical at 50 to 60 microseconds per item since inference scores independently.
LinkedIn feed ranking uses pairwise learning with 10 sampled pairs per query, achieving 0.76 NDCG@10 with gradient boosted trees, a 3 percent improvement over pointwise while keeping 55 microsecond serving latency.
📌 Examples
Pinterest uses pairwise ranking for home feed, constructing pairs from pins with dwell time above 3 seconds versus skipped pins, training a neural network with 600 features and achieving 0.81 NDCG@10.
Airbnb listing search samples pairs by comparing booked listings with listings that appeared higher but were not booked, weighting pairs by position difference to emphasize top 10 quality, improving booking conversion by 1.8 percent.
Microsoft Bing uses pairwise features in LambdaMART, sampling up to 20 pairs per query with NDCG delta weights, training on billions of query click pairs weekly to optimize web search ranking.
← Back to Learning to Rank (Pointwise/Pairwise/Listwise) Overview