ML-Powered Search & RankingRelevance Feedback (Click Models, Position Bias)Hard⏱️ ~2 min

How Do You Implement Production Exploration to Estimate Propensities?

Core Concept
Production exploration deliberately randomizes some portion of rankings to observe how items perform at different positions. This generates data needed to estimate propensities accurately.

Why You Need Exploration For Propensity Estimation

To estimate examination probability at each position, you need clicks on identical items at different positions. But a production system always shows best items at the top, so you never see how a top item performs at position 8. Exploration breaks this by showing items at positions they would not normally appear, sacrificing short term relevance for long term learning.

Implementation: Random Swapping

For 1-5% of requests, randomly swap two items in the result list. This creates pairs where the same items appear at swapped positions. Compare click rates to isolate the position effect. If item A at position 2 gets swapped to position 6 and its click rate drops from 15% to 4%, position 6 has roughly 27% of position 2 examination probability (4/15 = 0.27). Aggregate across thousands of swaps to build propensity curves.

Implementation: Epsilon Greedy Ranking

Instead of always showing the optimal ranking, show a random ranking with probability epsilon (1-5%). During exploration traffic, every position has equal probability of showing any item. This gives clean data on all positions. With 5% exploration, you sacrifice roughly 5% of optimal clicks. Most systems find 2-3% sufficient for stable estimates without noticeable user impact.

Warning: Exploration trades engagement for data quality. Start at 1-2% and increase only if propensity estimates are unstable.
💡 Key Takeaways
Without exploration, you cannot estimate propensities because items always appear at the same optimized positions
Random swapping (1-5% of traffic) creates natural experiments showing items at swapped positions
Epsilon greedy dedicates 2-3% traffic to random rankings for clean propensity data
The cost is reduced engagement during exploration, typically 1-5% of optimal clicks
📌 Interview Tips
1Explain the chicken and egg: you need propensities to debias, but need position variation to estimate them. Exploration solves this.
2Describe random swapping: swap positions 2 and 6, click rate drops from 15% to 4%, so position 6 has 27% of position 2 examination.
3Standard exploration rate is 2-3% epsilon greedy. Higher gives better estimates but hurts engagement.
← Back to Relevance Feedback (Click Models, Position Bias) Overview
How Do You Implement Production Exploration to Estimate Propensities? | Relevance Feedback (Click Models, Position Bias) - System Overflow