Core Concept
Measuring diversity and exploration impact requires metrics beyond click-through rate. Standard engagement metrics favor exploitation. You need separate metrics to track exploration health.
Coverage Metrics
Catalog coverage: What percentage of items received at least one impression this week? Low coverage indicates the system favors a small subset. Target: 80%+ of active items should get some exposure.
Category coverage per user: How many distinct categories appear in recommendations per user session? Higher is better for diversity but may reduce relevance.
Exploration Effectiveness
Cold item conversion rate: For items with fewer than 100 prior impressions, what is the engagement rate when shown? Should be within 50-80% of warm item rate. If much lower, exploration is showing irrelevant items.
Information gain: How much does showing an item reduce uncertainty about its true reward? High information gain means exploration is learning efficiently.
Long-Term Impact
Run long-term holdout experiments. Compare groups with different exploration rates over 4-8 weeks. Measure not just immediate engagement but also user retention, long-tail item sales, and model prediction accuracy over time.
✅ Best Practice: Track diversity metrics alongside engagement metrics in dashboards. If CTR improves but catalog coverage drops, investigate. Healthy systems show stable coverage with improving engagement.
✓Pure CTR optimization creates filter bubbles and homogeneous recommendations. Clickbait images may increase clicks but decrease downstream conversion and satisfaction.
✓Candidate curation limits the action space while enforcing diversity. Use up to N images per category with enforced category coverage constraints.
✓Convergence monitoring via top-k stability measures what percentage of the slate changes between time windows. High churn indicates ongoing exploration; stability indicates convergence.
✓Guardrails prevent exploration from harming business metrics. Require statistical significance and metric improvement thresholds before declaring a winner.
✓Multi-phase campaigns balance learning and validation: initial exploration phase (Thompson Sampling with uniform priors), then validation phase with winner holdout testing.
✓Long-tail entities (low traffic items, niche content) never converge because they lack sufficient samples. Pool into umbrella groups or use hierarchical bandits.