Learn→ML-Powered Search & Ranking→Evaluation (NDCG, MRR, CTR, Dwell Time)→2 of 6

ML-Powered Search & Ranking • Evaluation (NDCG, MRR, CTR, Dwell Time)Medium⏱️ ~3 min

NDCG: Measuring Ranking Quality With Position Discounting

Definition
NDCG (Normalized Discounted Cumulative Gain) measures ranking quality when items have graded relevance (e.g., 0=irrelevant, 1=somewhat relevant, 2=relevant, 3=highly relevant). It rewards placing highly relevant items at the top, with diminishing returns for lower positions.
How NDCG Works Step by Step
Start with Cumulative Gain (CG): sum of relevance scores in your ranked list. If top-5 items have relevance [3, 2, 3, 0, 1], CG = 9. Problem: CG ignores position. [3, 2, 3, 0, 1] and [0, 1, 2, 3, 3] both score 9, but the first ranking is obviously better.
Discounted Cumulative Gain (DCG) fixes this by dividing each relevance by log(position+1). Position 1 divides by log(2)=1, position 2 by log(3)=1.58, position 5 by log(6)=2.58. Higher positions get less discount, so relevant items there contribute more. For [3, 2, 3, 0, 1]: DCG = 3/1 + 2/1.58 + 3/2 + 0/2.32 + 1/2.58 = 3 + 1.26 + 1.5 + 0 + 0.39 = 6.15.
Normalized DCG divides by the ideal DCG (what you would get with perfect ranking). If perfect order is [3, 3, 2, 1, 0], ideal DCG = 3/1 + 3/1.58 + 2/2 + 1/2.32 + 0 = 6.33. NDCG = 6.15/6.33 = 0.97. A perfect ranking scores 1.0.
When to Use NDCG
NDCG shines when you have graded relevance labels and care about the full top-K ranking, not just the first result. Search results, product recommendations, and content feeds typically use NDCG@10 or NDCG@20. It is less useful for navigational queries where users want exactly one result (use MRR instead) or when all relevant items are equally relevant (binary relevance makes the grading pointless).
💡 Key Insight: NDCG values range 0 to 1. In practice, production systems target NDCG@10 of 0.4-0.6 for broad queries and 0.7-0.9 for navigational queries. A 0.02 improvement (e.g., 0.52 to 0.54) is often significant enough to ship.

💡 Key Takeaways

✓CG sums relevance scores but ignores position. DCG discounts by log(position+1) so top positions contribute more.

✓NDCG normalizes DCG by dividing by ideal DCG (perfect ranking). Score of 1.0 means perfect, 0 means worst possible.

✓Use NDCG when you have graded relevance (not binary) and care about full top-K ranking quality.

✓Production targets: NDCG@10 of 0.4-0.6 for broad queries, 0.7-0.9 for navigational. A 0.02 lift is often significant.

✓NDCG is less useful for single-answer queries (use MRR) or binary relevance (grading adds nothing).

📌 Interview Tips

1Walk through the DCG computation: relevance [3,2,3,0,1] with log discounting gives 6.15, normalized against ideal 6.33 = 0.97.

2Explain when NDCG is appropriate: graded relevance, full ranking matters. When not: single-answer queries, binary labels.

3Cite production NDCG targets (0.4-0.6 broad, 0.7-0.9 navigational) and significance threshold (0.02 lift is meaningful).

← Back to Evaluation (NDCG, MRR, CTR, Dwell Time) Overview