Recommendation SystemsEvaluation Metrics (Precision@K, NDCG, Coverage)Easy⏱️ ~2 min

Precision@K: Top K Accuracy for Ranked Recommendations

Precision@K answers a simple question: of the top K items you show a user, what fraction are actually relevant? It treats all relevant items equally and ignores their positions within those K items. If you recommend 10 movies and the user watches 3 of them, your Precision@10 is 0.3 or 30%. This metric shines when users only look at a small prefix of results, like the first row on Netflix's homepage or the first screen of Google search results. Relevance is typically binary: clicked or not clicked, purchased or not purchased, watched beyond 30 seconds or not. Production systems use small K values (5 to 20) because that aligns with actual screen real estate and user attention spans. The simplicity is both a strength and weakness. Precision@K is robust, easy to explain to stakeholders, and works well with binary labels. But it's completely blind to ordering within the top K. Showing your best item at position 10 looks identical to showing it at position 1. It also ignores graded relevance: a 3 minute watch counts the same as a 30 minute binge. In practice, a 0.5 to 1.0 percentage point improvement in Precision@10 (say, from 0.215 to 0.225) is often meaningful at scale. YouTube might track Precision@5 for the homepage hero row, while Spotify measures Precision@20 for personalized playlist sections. Always match K to your actual UI: optimizing Precision@20 when users see 8 items can hide regressions in the visible region.
💡 Key Takeaways
Computed as (number of relevant items in top K) divided by K, produces values between 0.0 and 1.0
Production K values: 5 to 10 for above the fold tiles, 10 to 20 for search first page or homepage rows, 30 for playlist style surfaces
Binary relevance definition examples: clicked, purchased, watched more than 30 seconds, completion rate above 50%
Position blind: placing best item at rank 1 versus rank 10 produces identical Precision@K score
Typical meaningful deltas: 0.5 to 1.0 percentage point improvements (0.215 to 0.225) drive significant business impact at billions of impressions
Always align K to actual UI surface: optimizing Precision@20 when users see 8 items hides regressions in visible region
📌 Examples
Netflix homepage: Precision@10 for top row recommendations, relevance defined as watch time exceeding 2 minutes within 7 days
YouTube: Precision@5 on mobile homepage hero section, binary label is click and watch more than 30 seconds
Spotify Discover Weekly: Precision@30 for full playlist, relevance is playing track more than 30 seconds or adding to library
Pinterest: Precision@20 for home feed, relevance defined as click, save, or long click (more than 2 seconds hover)
← Back to Evaluation Metrics (Precision@K, NDCG, Coverage) Overview