What is Inverse Propensity Scoring and When Does It Fail?
How IPS Corrects Position Bias
If an item at position 10 has only 10% chance of being examined, any click it receives should count 10 times more than a click at position 1 (where examination probability is 90%). Multiply each click by 1 / propensity. Position 10 weight: 1/0.10 = 10. Position 1 weight: 1/0.90 = 1.1. This cancels out position bias, treating clicks from all positions as equally informative about relevance.
When IPS Fails: High Variance
IPS has a variance problem. When propensity is very low, the inverse weight becomes huge. Position 20 with 2% examination probability gets weight 50. A single accidental click there contributes 50x as much to training as a deliberate click at position 1. A few noisy clicks at low positions can dominate your training signal and push the model in wrong directions.
When IPS Fails: Propensity Estimation Errors
IPS only works if you know the true propensities. But propensities are estimated from data, always with some error. Underestimating propensity at position 8 means you overweight clicks there. The problem compounds at low visibility positions where you have less data for accurate estimates, yet those positions get the largest IPS weights.