Recommendation SystemsReal-time Personalization (Session-based, Contextual Bandits)Hard⏱️ ~3 min

Failure Modes: Feedback Loops, Position Bias, and Drift

FEEDBACK LOOP COLLAPSE

When the system only shows high-confidence items, it never gathers data on alternatives. Items that were good become the only items with enough data to seem good. New items never get shown, never get clicks, never seem worth showing. The system converges to a tiny subset of the catalog. Fix by enforcing minimum exploration rates and diversity constraints.

POSITION BIAS IN TRAINING

Items shown in position 1 get 5 to 10 times more clicks than position 5, regardless of relevance. If you train on raw click data, the model learns that position 1 items are better. When you deploy, it ranks those items higher because they were clicked more, not because they were more relevant. Debias by logging position and using inverse propensity scoring (weighting clicks by 1 / probability of being shown in that position).

⚠️ Warning: Without propensity logging, you cannot evaluate new policies offline. Always log the probability that each item was shown in each position.

DELAYED FEEDBACK

Clicks happen in seconds, but purchases happen in hours or days. If you optimize for clicks, you might surface items that get clicked but rarely purchased. If you wait for purchase signals, you react too slowly. Common solution: use click as immediate signal, then retroactively adjust when purchase data arrives.

CONTEXT DRIFT

The relationship between context and reward changes over time. Holiday shopping behavior differs from regular browsing. A model trained in November fails in February. Detect drift by monitoring prediction accuracy over time. Retrain or use sliding window training to adapt.

💡 Key Takeaways
Feedback loop collapse: showing only high-confidence items means never learning about alternatives
Position 1 gets 5-10x more clicks than position 5; train on raw clicks and you learn position, not relevance
Always log probability each item was shown in each position for offline policy evaluation
Delayed feedback: optimize for clicks and miss purchase intent; wait for purchases and react too slowly
Context drift: November model fails in February; use sliding window training to adapt
📌 Interview Tips
1Describe collapse: new items never shown → never clicked → never seem worth showing
2Explain propensity logging: weight clicks by 1/P(shown in that position) to debias
3Discuss holiday drift: shopping behavior changes seasonally, models need retraining
← Back to Real-time Personalization (Session-based, Contextual Bandits) Overview
Failure Modes: Feedback Loops, Position Bias, and Drift | Real-time Personalization (Session-based, Contextual Bandits) - System Overflow