Learn→Recommendation Systems→Real-time Personalization (Session-based, Contextual Bandits)→5 of 6

Recommendation Systems • Real-time Personalization (Session-based, Contextual Bandits)Hard⏱️ ~3 min

Failure Modes: Feedback Loops, Position Bias, and Drift

FEEDBACK LOOP COLLAPSE
When the system only shows high-confidence items, it never gathers data on alternatives. Items that were good become the only items with enough data to seem good. New items never get shown, never get clicks, never seem worth showing. The system converges to a tiny subset of the catalog. Fix by enforcing minimum exploration rates and diversity constraints.
POSITION BIAS IN TRAINING
Items shown in position 1 get 5 to 10 times more clicks than position 5, regardless of relevance. If you train on raw click data, the model learns that position 1 items are better. When you deploy, it ranks those items higher because they were clicked more, not because they were more relevant. Debias by logging position and using inverse propensity scoring (weighting clicks by 1 / probability of being shown in that position).
⚠️ Warning: Without propensity logging, you cannot evaluate new policies offline. Always log the probability that each item was shown in each position.
DELAYED FEEDBACK
Clicks happen in seconds, but purchases happen in hours or days. If you optimize for clicks, you might surface items that get clicked but rarely purchased. If you wait for purchase signals, you react too slowly. Common solution: use click as immediate signal, then retroactively adjust when purchase data arrives.
CONTEXT DRIFT
The relationship between context and reward changes over time. Holiday shopping behavior differs from regular browsing. A model trained in November fails in February. Detect drift by monitoring prediction accuracy over time. Retrain or use sliding window training to adapt.

💡 Key Takeaways

✓Feedback loop collapse: showing only high-confidence items means never learning about alternatives

✓Position 1 gets 5-10x more clicks than position 5; train on raw clicks and you learn position, not relevance

✓Always log probability each item was shown in each position for offline policy evaluation

✓Delayed feedback: optimize for clicks and miss purchase intent; wait for purchases and react too slowly

✓Context drift: November model fails in February; use sliding window training to adapt

📌 Interview Tips

1Describe collapse: new items never shown → never clicked → never seem worth showing

2Explain propensity logging: weight clicks by 1/P(shown in that position) to debias

3Discuss holiday drift: shopping behavior changes seasonally, models need retraining

← Back to Real-time Personalization (Session-based, Contextual Bandits) Overview