ML-Powered Search & RankingEvaluation (NDCG, MRR, CTR, Dwell Time)Medium⏱️ ~3 min

MRR and Precision@K: When You Care About the First Correct Result

Core Concept
MRR (Mean Reciprocal Rank) measures where the first relevant result appears. Precision@K measures what fraction of top-K results are relevant. Both use binary relevance.

MRR: When Users Want One Answer

Reciprocal Rank is 1 divided by the position of the first relevant result. First relevant at position 1: RR = 1.0. Position 3: RR = 0.33. Position 10: RR = 0.1. No relevant in top-K: RR = 0. MRR averages this across queries. MRR of 0.5 means first relevant at position 2 on average.

Use MRR for navigational queries ("facebook login") where users want exactly one answer. Position 1 versus 2 matters enormously; position 5 versus 6 barely matters. MRR captures this through the 1/position formula.

Precision@K: What Fraction of Results Are Good

Precision@K = relevant items in top-K divided by K. If top-10 has 6 relevant items, Precision@10 = 0.6. Position within top-K does not matter: [relevant, relevant, irrelevant] and [irrelevant, relevant, relevant] both score 0.67.

Use Precision@K when users scan multiple results: image search, product listings. High Precision@10 means mostly relevant items without scrolling past garbage.

Choosing Between MRR, Precision, and NDCG

MRR: Single answer matters (navigational search, QA). Precision@K: Multiple results matter equally (product grid). NDCG: Multiple results with different quality levels. In practice, teams track multiple: NDCG for overall quality, MRR for navigational, Precision for coverage.

⚠️ Trade-off: MRR and Precision use binary relevance, ignoring quality gradations. If distinguishing "somewhat" from "highly" relevant matters, use NDCG.
💡 Key Takeaways
MRR = 1/position of first relevant, averaged. MRR 0.5 means first relevant at position 2 on average.
Use MRR for navigational queries where users want one answer. Position 1 vs 2 matters greatly.
Precision@K = relevant in top-K / K. Position within K does not matter, only count.
Use Precision@K for multiple equally-relevant results: product grids, image galleries.
Binary limitation: MRR and Precision cannot distinguish somewhat from highly relevant.
📌 Interview Tips
1Walk through MRR: position 1 = 1.0, position 3 = 0.33, position 10 = 0.1. Average across queries.
2Choose metric by use case: MRR for single-answer, Precision for equally relevant results, NDCG for graded.
3Note binary limitation: these metrics treat all relevant items the same regardless of quality.
← Back to Evaluation (NDCG, MRR, CTR, Dwell Time) Overview