Recommendation SystemsEvaluation Metrics (Precision@K, NDCG, Coverage)Medium⏱️ ~2 min

NDCG@K: Position Aware Ranking Quality

Core Concept
Precision@K measures what fraction of the top K recommendations were relevant. If you show 10 items and 7 were clicked, Precision@10 = 0.7. Simple, interpretable, but ignores ranking within the K items.

Computing Precision@K

Precision@K = (relevant items in top K) / K

For user U, show K items. Count how many were interacted with (clicked, purchased, rated). Divide by K. Average across all users to get system-wide Precision@K. Common values: K = 5, 10, 20.

Recall@K

Recall@K = (relevant items in top K) / (total relevant items)

Of all items the user would find relevant, what fraction appeared in the top K? Harder to compute because you need to know all relevant items, not just those shown. In practice, use items the user eventually interacted with as ground truth.

Precision-Recall Trade-off

Increasing K typically increases Recall (more chances to include relevant items) but decreases Precision (more irrelevant items in the denominator). Choose K based on your UI: if you show 10 items, measure Precision@10. If users scroll through 50, measure Recall@50.

⚠️ Interview Pattern: When asked about recommendation metrics, define Precision@K and Recall@K with formulas. Explain why both matter: Precision ensures quality, Recall ensures coverage. Show you understand the trade-off between showing few high-confidence items versus many lower-confidence items.
💡 Key Takeaways
DCG = sum of gains discounted by log2(position+1); NDCG normalizes by ideal DCG to get 0-1 scale where 1.0 is perfect ranking.
Graded relevance captures nuance: map engagement levels to 0-4 scale (skip=0, view=1, partial=2, complete=3, save/share=4).
Standard practice: NDCG@1, NDCG@3, NDCG@10 with multi-level judgments, offline evaluation over tens of millions of query-result pairs.
NDCG@1 measures top-pick quality; NDCG@10 balances precision and ranking depth; NDCG@100 focuses on overall list quality for long-scroll UIs.
Position weighting via log discount reflects user attention decay: position 1 has full weight, position 10 has about 30% weight.
📌 Interview Tips
1When explaining NDCG: describe DCG as sum of gains discounted by log2(position+1), normalized by ideal DCG to get 0-1 scale; perfect ranking scores 1.0.
2For graded relevance: mention mapping engagement to levels (0-4 scale): skip=0, view=1, partial=2, complete=3, save=4; captures nuance that binary labels miss.
3When asked about K selection: explain that NDCG@1 measures top-pick quality, NDCG@10 balances precision and depth, NDCG@100 focuses on overall list quality.
← Back to Evaluation Metrics (Precision@K, NDCG, Coverage) Overview