How Position Bias Distorts Training Data
EXAMINATION VERSUS RELEVANCE
A click requires two things: the user must see the item (examination) and the item must be appealing (relevance). Position affects examination probability but not relevance. Position 1 has perhaps 95% examination probability; position 10 has 20%. If both positions have 10% clicks, the item at position 10 is actually much more relevant because it converted 50% of those who saw it (10% / 20%) versus 10.5% at position 1 (10% / 95%).
MEASURING POSITION EFFECT
To measure position bias, run randomization experiments. Show the same item in different positions to different users and measure click rates. You will find a curve like: position 1 baseline, position 2 is 70% of position 1, position 3 is 50%, position 5 is 25%, position 10 is 10%. This curve is your position bias model. The exact shape varies by product (search results, feeds, grids) but the pattern is universal.
SELECTION BIAS COMPOUNDS THE PROBLEM
Selection bias means users with certain preferences are more likely to see certain items. If sports fans mostly see sports content at the top (because past models learned to show it), their clicks train the model that sports content is universally popular. But it is only popular because sports fans were over represented in the training data. Selection and position bias together create severely distorted models.
DATA COLLECTION STRATEGY
Log both the position shown and the probability of showing in that position (propensity). Without propensity, you cannot correct for bias later. Standard format: user, item, position, propensity score, action (click or not), timestamp.