Recommendation SystemsEvaluation Metrics (Precision@K, NDCG, Coverage)Medium⏱️ ~2 min

Coverage Metrics: Ecosystem Health Beyond Accuracy

Core Concept
NDCG (Normalized Discounted Cumulative Gain) measures ranking quality by rewarding relevant items positioned higher in the list. Unlike Precision@K, NDCG cares about order: a relevant item at position 1 is worth more than position 10.

How NDCG Works

DCG (Discounted Cumulative Gain): Sum of relevance scores, discounted by position. Formula: DCG = Σ (relevance_i / log2(position_i + 1)). Position 1 gets full credit (divisor = 1). Position 10 gets credit divided by log2(11) ≈ 3.5. This heavily penalizes placing relevant items far down the list.

Normalization: Divide DCG by ideal DCG (perfect ranking). NDCG = DCG / IDCG. Ranges 0 to 1. NDCG = 1.0 means perfect ranking. NDCG = 0.7 means decent but suboptimal. NDCG = 0.5 is mediocre. Below 0.3 indicates serious ranking problems.

When to Use NDCG vs Precision

Use Precision@K when you show exactly K items and users see them equally (email subject lines, single-row carousels). Use NDCG when position affects attention: search results, infinite scroll, multi-row grids. Most recommendation interfaces have position bias, making NDCG the more appropriate metric.

NDCG@10 is the most common variant. It focuses on the first 10 positions where user attention is highest. Target NDCG@10 varies by domain: 0.3-0.4 for broad content recommendations, 0.6-0.8 for focused search or personalization tasks.

💡 Key Insight: NDCG is the most commonly used offline metric for ranking systems because it combines what matters: relevance (what you recommend) and ranking quality (where you place it). When an interviewer asks about recommendation metrics, NDCG should be in your first sentence.
💡 Key Takeaways
Item (catalog) coverage: unique items recommended divided by catalog size, computed over 7 to 28 day windows, typical values 20% to 60% depending on catalog size and diversity policy
User coverage: fraction of users receiving at least one relevant item in top K, critical for detecting cold start failures in new user or niche interest segments
Long tail coverage: share of impressions to bottom 50% of items by popularity, or exposure Gini coefficient (0 = perfect equality, 1 = one item gets everything), typical Gini 0.6 to 0.9 range
Creator/artist coverage: number or percentage of distinct creators receiving impressions, monitored weekly, 1 to 3 percentage point tail exposure shifts materially impact creator ecosystems
Accuracy versus coverage tradeoff: maximizing Precision@K collapses coverage, diversity constraints reduce accuracy 1% to 3% relative but improve long term retention and supply health
Popularity collapse symptom: rising exposure Gini, declining tail impressions, stagnant discovery metrics, fixed by re ranking with diversity constraints or minimum exposure floors
📌 Interview Tips
1When asked about coverage: explain catalog coverage (% of items receiving impressions), user coverage (% of users getting personalized recs), and diversity coverage (category/creator distribution).
2For business impact: mention that low coverage indicates winner-take-all effects where popular items dominate; creators/sellers leave platforms where new items cant surface.
3When discussing targets: explain that 40-60% catalog coverage in 30 days is typical; tail exposure (bottom 50% of items) often targets 15-30% of impressions.
← Back to Evaluation Metrics (Precision@K, NDCG, Coverage) Overview