Recommendation SystemsDiversity & Exploration (Multi-armed Bandits)Hard⏱️ ~3 min

Diversity Constraints and Convergence Monitoring in Production Bandits

Core Concept
Measuring diversity and exploration impact requires metrics beyond click-through rate. Standard engagement metrics favor exploitation. You need separate metrics to track exploration health.

Coverage Metrics

Catalog coverage: What percentage of items received at least one impression this week? Low coverage indicates the system favors a small subset. Target: 80%+ of active items should get some exposure.

Category coverage per user: How many distinct categories appear in recommendations per user session? Higher is better for diversity but may reduce relevance.

Exploration Effectiveness

Cold item conversion rate: For items with fewer than 100 prior impressions, what is the engagement rate when shown? Should be within 50-80% of warm item rate. If much lower, exploration is showing irrelevant items.

Information gain: How much does showing an item reduce uncertainty about its true reward? High information gain means exploration is learning efficiently.

Long-Term Impact

Run long-term holdout experiments. Compare groups with different exploration rates over 4-8 weeks. Measure not just immediate engagement but also user retention, long-tail item sales, and model prediction accuracy over time.

✅ Best Practice: Track diversity metrics alongside engagement metrics in dashboards. If CTR improves but catalog coverage drops, investigate. Healthy systems show stable coverage with improving engagement.
💡 Key Takeaways
Pure CTR optimization creates filter bubbles and homogeneous recommendations. Clickbait images may increase clicks but decrease downstream conversion and satisfaction.
Candidate curation limits the action space while enforcing diversity. Use up to N images per category with enforced category coverage constraints.
Convergence monitoring via top-k stability measures what percentage of the slate changes between time windows. High churn indicates ongoing exploration; stability indicates convergence.
Guardrails prevent exploration from harming business metrics. Require statistical significance and metric improvement thresholds before declaring a winner.
Multi-phase campaigns balance learning and validation: initial exploration phase (Thompson Sampling with uniform priors), then validation phase with winner holdout testing.
Long-tail entities (low traffic items, niche content) never converge because they lack sufficient samples. Pool into umbrella groups or use hierarchical bandits.
📌 Interview Tips
1When asked about monitoring: explain tracking week-over-week churn in top selections - high churn (40%+) indicates exploration phase, low churn (<5%) indicates convergence.
2For cold start handling: describe initial uniform exploration period (first week or first 1000 impressions) to seed all arms before allowing Thompson Sampling to specialize.
3When discussing diversity constraints: mention enforcing category coverage (at least one item from each category in top positions) alongside bandit optimization.
← Back to Diversity & Exploration (Multi-armed Bandits) Overview