Label Engineering: Creating Training Labels From Implicit Feedback
Raw Signals Are Biased Labels
A click at position 1 doesn't mean the same thing as a click at position 10. Position 1 gets 10x more clicks regardless of quality. If you use raw clicks as positive labels, you train the model to predict position, not relevance. Label engineering starts by recognizing that raw signals are contaminated by presentation effects: position, device, time of day, surrounding items.
Propensity-Weighted Labels
Create a label weight based on display propensity. Run 1-5% exploration traffic with randomized positions. Build a position-to-propensity lookup: P(click|position). Weight each training example by 1/propensity. A click at position 10 (propensity 0.05) gets weight 20; position 1 (propensity 0.5) gets weight 2. This rebalances training to approximate what clicks would look like if all items were shown equally.
Multi-Signal Label Aggregation
Single signals are noisy. Combine multiple user actions: label = 0.3 × click + 0.5 × add_to_cart + 1.0 × purchase. Different signals have different noise levels and business value. Clicks are high volume but noisy; purchases are low volume but high confidence. Time-weighted: a click with 30+ second dwell is stronger than a 2-second bounce. The weights become tunable hyperparameters.
Position as Feature vs Position for Debiasing
Two uses: (1) Include position as input feature during training, set to constant (e.g., position 1) at serving. Model learns to factor out position effects. (2) Use position only for label weighting, never as feature. Model trains on debiased labels but never sees position. Approach 1 requires careful implementation to avoid leakage. Approach 2 needs accurate propensity estimates. Most production systems use both.