Failure Modes: Selection Bias, Contamination, and Reshuffling
Selection Bias
If holdout assignment correlates with user characteristics (power users, region, tenure), all comparisons are invalid. Caused by: using non-random identifiers, different enrollment paths, or bugs in hash function. Detect by comparing pre-holdout characteristics between groups - they should be statistically identical.
Contamination
Holdout users see production features due to: gating bugs (forgot holdout check), shared accounts (family sharing), network effects (holdout users interact with production user content). If holdout users interact with production users who share content, invites, or recommendations, the holdout experience is contaminated.
Reshuffling Problems
Changing the holdout salt mid-stream breaks continuity. Users who move from holdout to production suddenly see years of features at once - their behavior change isnt comparable to gradual adoption. Either maintain permanent holdout or clearly restart all measurement after reshuffle with new baseline.