Failure Modes: Selection Bias, Contamination, and Reshuffling

Core Concept
Holdout failures corrupt long-term measurement through selection bias, contamination, or improper reshuffling. These failures can invalidate months or years of accumulated data.
Selection Bias
If holdout assignment correlates with user characteristics (power users, region, tenure), all comparisons are invalid. Caused by: using non-random identifiers, different enrollment paths, or bugs in hash function. Detect by comparing pre-holdout characteristics between groups - they should be statistically identical.
Contamination
Holdout users see production features due to: gating bugs (forgot holdout check), shared accounts (family sharing), network effects (holdout users interact with production user content). If holdout users interact with production users who share content, invites, or recommendations, the holdout experience is contaminated.
💡 Key Insight: Social networks and marketplaces have inherent contamination: holdout users see content created by production users with new features. This limits holdout validity for these product types.
Reshuffling Problems
Changing the holdout salt mid-stream breaks continuity. Users who move from holdout to production suddenly see years of features at once - their behavior change isnt comparable to gradual adoption. Either maintain permanent holdout or clearly restart all measurement after reshuffle with new baseline.

💡 Key Takeaways

✓Selection bias: holdout assignment correlates with user characteristics due to non-random identifiers or bugs

✓Contamination: holdout users see production features via bugs, shared accounts, or network effects

✓Social/marketplace products have inherent contamination from holdout users seeing production content

✓Reshuffling breaks continuity: users moving from holdout suddenly see years of features

📌 Interview Tips

1When detecting bias: compare pre-holdout characteristics between groups before trusting comparisons

2For contamination: audit feature exposure logs to verify holdout users actually see holdout experience

← Back to Holdout Groups & Long-term Impact Overview