Failure Modes: Feedback Loops, Position Bias, and Drift
FEEDBACK LOOP COLLAPSE
When the system only shows high-confidence items, it never gathers data on alternatives. Items that were good become the only items with enough data to seem good. New items never get shown, never get clicks, never seem worth showing. The system converges to a tiny subset of the catalog. Fix by enforcing minimum exploration rates and diversity constraints.
POSITION BIAS IN TRAINING
Items shown in position 1 get 5 to 10 times more clicks than position 5, regardless of relevance. If you train on raw click data, the model learns that position 1 items are better. When you deploy, it ranks those items higher because they were clicked more, not because they were more relevant. Debias by logging position and using inverse propensity scoring (weighting clicks by 1 / probability of being shown in that position).
DELAYED FEEDBACK
Clicks happen in seconds, but purchases happen in hours or days. If you optimize for clicks, you might surface items that get clicked but rarely purchased. If you wait for purchase signals, you react too slowly. Common solution: use click as immediate signal, then retroactively adjust when purchase data arrives.
CONTEXT DRIFT
The relationship between context and reward changes over time. Holiday shopping behavior differs from regular browsing. A model trained in November fails in February. Detect drift by monitoring prediction accuracy over time. Retrain or use sliding window training to adapt.