How Do You Implement Production Exploration to Estimate Propensities?
Why You Need Exploration For Propensity Estimation
To estimate examination probability at each position, you need clicks on identical items at different positions. But a production system always shows best items at the top, so you never see how a top item performs at position 8. Exploration breaks this by showing items at positions they would not normally appear, sacrificing short term relevance for long term learning.
Implementation: Random Swapping
For 1-5% of requests, randomly swap two items in the result list. This creates pairs where the same items appear at swapped positions. Compare click rates to isolate the position effect. If item A at position 2 gets swapped to position 6 and its click rate drops from 15% to 4%, position 6 has roughly 27% of position 2 examination probability (4/15 = 0.27). Aggregate across thousands of swaps to build propensity curves.
Implementation: Epsilon Greedy Ranking
Instead of always showing the optimal ranking, show a random ranking with probability epsilon (1-5%). During exploration traffic, every position has equal probability of showing any item. This gives clean data on all positions. With 5% exploration, you sacrifice roughly 5% of optimal clicks. Most systems find 2-3% sufficient for stable estimates without noticeable user impact.