Core Concept
Precision@K measures what fraction of the top K recommendations were relevant. If you show 10 items and 7 were clicked, Precision@10 = 0.7. Simple, interpretable, but ignores ranking within the K items.
Computing Precision@K
Precision@K = (relevant items in top K) / K
For user U, show K items. Count how many were interacted with (clicked, purchased, rated). Divide by K. Average across all users to get system-wide Precision@K. Common values: K = 5, 10, 20.
Recall@K
Recall@K = (relevant items in top K) / (total relevant items)
Of all items the user would find relevant, what fraction appeared in the top K? Harder to compute because you need to know all relevant items, not just those shown. In practice, use items the user eventually interacted with as ground truth.
Precision-Recall Trade-off
Increasing K typically increases Recall (more chances to include relevant items) but decreases Precision (more irrelevant items in the denominator). Choose K based on your UI: if you show 10 items, measure Precision@10. If users scroll through 50, measure Recall@50.
⚠️ Interview Pattern: When asked about recommendation metrics, define Precision@K and Recall@K with formulas. Explain why both matter: Precision ensures quality, Recall ensures coverage. Show you understand the trade-off between showing few high-confidence items versus many lower-confidence items.
✓DCG = sum of gains discounted by log2(position+1); NDCG normalizes by ideal DCG to get 0-1 scale where 1.0 is perfect ranking.
✓Graded relevance captures nuance: map engagement levels to 0-4 scale (skip=0, view=1, partial=2, complete=3, save/share=4).
✓Standard practice: NDCG@1, NDCG@3, NDCG@10 with multi-level judgments, offline evaluation over tens of millions of query-result pairs.
✓NDCG@1 measures top-pick quality; NDCG@10 balances precision and ranking depth; NDCG@100 focuses on overall list quality for long-scroll UIs.
✓Position weighting via log discount reflects user attention decay: position 1 has full weight, position 10 has about 30% weight.