Recommendation SystemsPosition Bias & Feedback LoopsHard⏱️ ~3 min

Failure Modes: Propensity Errors, Format Changes, and Delayed Loops

PROPENSITY ESTIMATION ERRORS

If propensity estimates are wrong, IPS makes things worse. Common causes: using production model propensity when the actual model was different (training serving skew), not accounting for position randomization policy, ignoring user level personalization in propensity calculation. Validate propensity by comparing estimated versus empirical distribution of impressions.

DISPLAY FORMAT CHANGES

Position bias curves change when display format changes. Moving from a list to a grid changes which positions get attention. Adding a carousel above the main list shifts all position curves down. If you apply an old position model to a new format, debiasing is wrong. Remeasure position curves after any UI change and retrain position models.

⚠️ Warning: Mobile and desktop have different position bias curves. A model trained on desktop data will misbehave on mobile traffic. Segment by device type.

EXPLORATION GONE WRONG

Too much exploration (over 10%) visibly hurts user experience and triggers complaints. Too little (under 1%) leaves you blind. Unbalanced exploration (always exploring the same item types) creates new biases. Monitor exploration coverage: are all item categories getting explored proportionally? Is exploration distributed across user segments?

DELAYED FEEDBACK LOOPS

Some feedback loops take months to manifest. The model slowly narrows its recommendations, but daily metrics look fine. By the time engagement drops, the problem is severe. Track catalog coverage over 90 day windows. If coverage trends down consistently, you have a slow feedback loop even if daily metrics are stable.

💡 Key Takeaways
Wrong propensity makes IPS worse: validate by comparing estimated vs empirical impression distribution
UI changes invalidate position curves: list to grid, adding carousel, all require remeasurement
Mobile and desktop have different position bias - segment by device type
Exploration 10%+ hurts UX visibly; under 1% leaves you blind; monitor category coverage
Slow feedback loops take months: track 90-day catalog coverage trends even when daily metrics look fine
📌 Interview Tips
1Describe training-serving skew: production model changed but training still uses old propensity
2Explain format change: list to grid shifts attention from position 5 to position 6
3Discuss delayed detection: daily engagement stable but 90-day catalog coverage dropping 2% per month
← Back to Position Bias & Feedback Loops Overview