Debiasing Techniques: IPS, Position Features, and Trade-offs
INVERSE PROPENSITY SCORING
Weight each training example by the inverse of its propensity (probability of being shown). If item A was shown with 80% probability and got a click, weight it by 1/0.8 = 1.25. If item B was shown with 10% probability and got a click, weight it by 1/0.1 = 10. This amplifies signals from items that were unlikely to be shown, correcting for selection bias.
POSITION AWARE MODELS
Train the model with position as an explicit feature. During training, the model learns that position 1 gets 5x more clicks than position 5. At serving time, set position to a constant (like position 3 for all items) to predict relevance independent of display position. This separates the position effect from the relevance signal.
EXPLORATION RATE TRADEOFF
More exploration (5-10% random traffic) provides unbiased data but hurts short term metrics. Users in exploration see suboptimal recommendations. Less exploration (1%) protects metrics but leaves you blind to new items and changing preferences. Start with 5% and reduce as your propensity estimates improve.
VARIANCE VS BIAS TRADEOFF
IPS reduces bias but increases variance. Rare events get huge weights, making training unstable. Clipping weights (capping at 10x or 100x) reduces variance but reintroduces some bias. Doubly robust estimators combine IPS with model predictions to reduce variance while maintaining low bias.