Learn→ML-Powered Search & Ranking→Relevance Feedback (Click Models, Position Bias)→3 of 6

ML-Powered Search & Ranking • Relevance Feedback (Click Models, Position Bias)Medium⏱️ ~3 min

What is Inverse Propensity Scoring and When Does It Fail?

Definition
Inverse Propensity Scoring (IPS) reweights each training example by the inverse of its probability of being observed. Items at low visibility positions get higher weights to compensate for fewer chances to receive clicks.
How IPS Corrects Position Bias
If an item at position 10 has only 10% chance of being examined, any click it receives should count 10 times more than a click at position 1 (where examination probability is 90%). Multiply each click by 1 / propensity. Position 10 weight: 1/0.10 = 10. Position 1 weight: 1/0.90 = 1.1. This cancels out position bias, treating clicks from all positions as equally informative about relevance.
When IPS Fails: High Variance
IPS has a variance problem. When propensity is very low, the inverse weight becomes huge. Position 20 with 2% examination probability gets weight 50. A single accidental click there contributes 50x as much to training as a deliberate click at position 1. A few noisy clicks at low positions can dominate your training signal and push the model in wrong directions.
When IPS Fails: Propensity Estimation Errors
IPS only works if you know the true propensities. But propensities are estimated from data, always with some error. Underestimating propensity at position 8 means you overweight clicks there. The problem compounds at low visibility positions where you have less data for accurate estimates, yet those positions get the largest IPS weights.
⚠️ Key Trade-off: IPS trades bias for variance. Weight clipping (capping at 10 or 20) reduces variance but reintroduces some bias. In practice, clipped IPS often works better than pure IPS.

💡 Key Takeaways

✓IPS reweights by 1/propensity: position 10 (10% examination) gets weight 10, position 1 (90%) gets weight 1.1

✓High variance is the main failure: positions with 2% propensity create weights of 50, amplifying noise

✓Propensity estimation errors are amplified by IPS, especially at low visibility positions with sparse data

✓Weight clipping trades small bias for reduced variance and more stable training

📌 Interview Tips

1Explain the bias variance tradeoff: IPS is mathematically unbiased but high variance. Clipped IPS works better in practice.

2Walk through the math: position 10 has 10% propensity, weight = 10. Position 1 has 90%, weight = 1.1. This equalizes contribution.

3Emphasize that propensity estimation errors compound with IPS. Accurate propensities are critical.

← Back to Relevance Feedback (Click Models, Position Bias) Overview