Learn→Recommendation Systems→Diversity & Exploration (Multi-armed Bandits)→4 of 6

Recommendation Systems • Diversity & Exploration (Multi-armed Bandits)Hard⏱️ ~3 min

Slate and Ranked Bandits: Handling Multiple Positions and Positional Bias

Core Concept
Diversity constraints ensure recommendation lists cover multiple categories, brands, or content types. Even if the model predicts a user loves action movies, showing 20 action movies creates a poor experience.
Maximal Marginal Relevance (MMR)
Score items by MMR = lambda × relevance - (1-lambda) × similarity_to_selected. Start with highest relevance item. Each subsequent item balances relevance against similarity to already-selected items. Lambda around 0.5-0.7 works for most applications.
Category Caps
Hard constraint: no more than N items from any single category. Simple and interpretable. Example: max 3 items per genre in a 20-item list. Ensures coverage but may leave slots unfilled if categories are exhausted. Combine with relevance-based fallback.
Submodular Optimization
Define diversity as a submodular function: adding an item provides diminishing returns as similar items are already selected. Greedy selection with submodular objective gives theoretical guarantees on diversity. More principled than ad-hoc constraints but harder to implement and explain.
⚠️ Interview Pattern: When asked about recommendation diversity, mention MMR first (balances relevance and diversity), then category caps (simple business rules), then submodular optimization (theoretically grounded). Interviewers appreciate seeing you know both practical and principled approaches.

💡 Key Takeaways

✓Contextual bandits condition arm selection on context features (user segment, device, time), learning per-context preferences rather than global averages.

✓Per-position bandits run one independent bandit per slot, treating each position as a separate decision point with its own action space and reward signal.

✓Slate bandits optimize unordered top-k sets and observe feedback from all k items per impression. More data-efficient but requires handling inter-item dependencies.

✓Reward attribution becomes complex with slates. If a user clicks position 2, positions 1 and 3 were also shown and contributed to the decision context.

✓Traffic requirements scale with granularity. Running 30 simultaneous bandits (10 positions × 3 segments) requires sufficient per-bandit traffic to converge.

📌 Interview Tips

1When asked about contextual bandits: explain that context features (user segment, device, time) modify arm selection; models learn per-context preferences rather than global averages.

2For slate optimization: describe selecting k items from n candidates - sample all n, pick top k samples; accounts for position effects and inter-item dependencies.

3When discussing scale: mention that separate bandits per position (10 slots × 3 segments = 30 bandits) is common; each optimizes independently with shared arm statistics.

← Back to Diversity & Exploration (Multi-armed Bandits) Overview