Feedback Loops and Position Bias in Ranking Systems
The Feedback Loop Challenge
Ranking and recommendation systems face a unique skew challenge: the training data depends on previous model outputs, creating feedback loops that amplify bias over successive model generations. When YouTube recommends videos, users are far more likely to click items in position 1 or 2 than position 10, even if the items are equally relevant. If you naively train on this data, the model learns that top positions predict clicks, creating a self reinforcing cycle where popular items stay popular regardless of true relevance.
Position Bias
Position bias is the most common manifestation. A video shown in position 1 might get 10 percent CTR, while the same video in position 5 gets 2 percent CTR purely due to position. Training without correction causes the model to conflate position with quality: it learns high position equals high relevance and predicts high scores for historically top ranked items. Google Search deals with this by separating positional features from content features in their ranking models.
Counterfactual Corrections
Counterfactual logging and propensity weighting provide mathematical corrections. When you show an item in position 3, log not just the click outcome but the probability it would have been shown there under a random policy. During training, weight each example by the inverse of this propensity score, upweighting items that were shown despite low model scores. LinkedIn Feed uses a hybrid approach: 90 to 95 percent of traffic follows the production model (exploitation), 5 to 10 percent uses randomized ranking (exploration) to collect unbiased training data.
Update Frequency Risks
The challenge scales with model update frequency. Batch retrained models (weekly or daily) allow feedback to accumulate slowly, giving time to detect and correct issues. Online learning models that update continuously can enter bad feedback spirals within hours. Exploration mechanisms and diversity constraints prevent these spirals, sacrificing 2 to 3 percent short term engagement for long term user satisfaction.