Listwise Ranking: Optimizing the Entire List With Metric Aware Losses
The Optimization Challenge
NDCG and similar metrics involve sorting: compute scores, sort items by score, then measure quality. The problem: sorting isn't differentiable (you can't compute gradients through a sort operation). Without gradients, standard training doesn't work. Solutions approximate gradients: estimate how small score changes would affect the final metric, then adjust scores in the direction that improves the metric.
How Metric-Aware Training Works
For each pair of items, compute: if we swapped their positions, how much would ranking quality change? Items where swaps cause large quality drops get stronger training signals. A swap between positions 1 and 2 might drop NDCG by 0.1; a swap between 50 and 51 might drop it by 0.001. Training naturally focuses on getting top positions right because that's where the metric is most sensitive.
LambdaMART: The Production Standard
LambdaMART combines two ideas: "lambda" gradients that estimate metric changes from position swaps, and "MART" (gradient boosted decision trees), which builds predictions by combining many simple decision trees. Each tree corrects errors from previous trees. LambdaMART remains the industry workhorse: it's accurate, interpretable (you can inspect which features drive rankings), and doesn't require GPUs. Neural approaches exist but rarely outperform LambdaMART enough to justify added complexity.