Learn→Recommendation Systems→Evaluation Metrics (Precision@K, NDCG, Coverage)→3 of 6

Recommendation Systems • Evaluation Metrics (Precision@K, NDCG, Coverage)Medium⏱️ ~2 min

Coverage Metrics: Ecosystem Health Beyond Accuracy

Core Concept
NDCG (Normalized Discounted Cumulative Gain) measures ranking quality by rewarding relevant items positioned higher in the list. Unlike Precision@K, NDCG cares about order: a relevant item at position 1 is worth more than position 10.
How NDCG Works
DCG (Discounted Cumulative Gain): Sum of relevance scores, discounted by position. Formula: DCG = Σ (relevance_i / log2(position_i + 1)). Position 1 gets full credit (divisor = 1). Position 10 gets credit divided by log2(11) ≈ 3.5. This heavily penalizes placing relevant items far down the list.
Normalization: Divide DCG by ideal DCG (perfect ranking). NDCG = DCG / IDCG. Ranges 0 to 1. NDCG = 1.0 means perfect ranking. NDCG = 0.7 means decent but suboptimal. NDCG = 0.5 is mediocre. Below 0.3 indicates serious ranking problems.
When to Use NDCG vs Precision
Use Precision@K when you show exactly K items and users see them equally (email subject lines, single-row carousels). Use NDCG when position affects attention: search results, infinite scroll, multi-row grids. Most recommendation interfaces have position bias, making NDCG the more appropriate metric.
NDCG@10 is the most common variant. It focuses on the first 10 positions where user attention is highest. Target NDCG@10 varies by domain: 0.3-0.4 for broad content recommendations, 0.6-0.8 for focused search or personalization tasks.
💡 Key Insight: NDCG is the most commonly used offline metric for ranking systems because it combines what matters: relevance (what you recommend) and ranking quality (where you place it). When an interviewer asks about recommendation metrics, NDCG should be in your first sentence.

💡 Key Takeaways

✓Item (catalog) coverage: unique items recommended divided by catalog size, computed over 7 to 28 day windows, typical values 20% to 60% depending on catalog size and diversity policy

✓User coverage: fraction of users receiving at least one relevant item in top K, critical for detecting cold start failures in new user or niche interest segments

✓Long tail coverage: share of impressions to bottom 50% of items by popularity, or exposure Gini coefficient (0 = perfect equality, 1 = one item gets everything), typical Gini 0.6 to 0.9 range

✓Creator/artist coverage: number or percentage of distinct creators receiving impressions, monitored weekly, 1 to 3 percentage point tail exposure shifts materially impact creator ecosystems

✓Accuracy versus coverage tradeoff: maximizing Precision@K collapses coverage, diversity constraints reduce accuracy 1% to 3% relative but improve long term retention and supply health

✓Popularity collapse symptom: rising exposure Gini, declining tail impressions, stagnant discovery metrics, fixed by re ranking with diversity constraints or minimum exposure floors

📌 Interview Tips

1When asked about coverage: explain catalog coverage (% of items receiving impressions), user coverage (% of users getting personalized recs), and diversity coverage (category/creator distribution).

2For business impact: mention that low coverage indicates winner-take-all effects where popular items dominate; creators/sellers leave platforms where new items cant surface.

3When discussing targets: explain that 40-60% catalog coverage in 30 days is typical; tail exposure (bottom 50% of items) often targets 15-30% of impressions.

← Back to Evaluation Metrics (Precision@K, NDCG, Coverage) Overview