A/B Testing & ExperimentationHoldout Groups & Long-term ImpactHard⏱️ ~2 min

Failure Modes: Selection Bias, Contamination, and Reshuffling

Core Concept
Holdout failures corrupt long-term measurement through selection bias, contamination, or improper reshuffling. These failures can invalidate months or years of accumulated data.

Selection Bias

If holdout assignment correlates with user characteristics (power users, region, tenure), all comparisons are invalid. Caused by: using non-random identifiers, different enrollment paths, or bugs in hash function. Detect by comparing pre-holdout characteristics between groups - they should be statistically identical.

Contamination

Holdout users see production features due to: gating bugs (forgot holdout check), shared accounts (family sharing), network effects (holdout users interact with production user content). If holdout users interact with production users who share content, invites, or recommendations, the holdout experience is contaminated.

💡 Key Insight: Social networks and marketplaces have inherent contamination: holdout users see content created by production users with new features. This limits holdout validity for these product types.

Reshuffling Problems

Changing the holdout salt mid-stream breaks continuity. Users who move from holdout to production suddenly see years of features at once - their behavior change isnt comparable to gradual adoption. Either maintain permanent holdout or clearly restart all measurement after reshuffle with new baseline.

💡 Key Takeaways
Selection bias: holdout assignment correlates with user characteristics due to non-random identifiers or bugs
Contamination: holdout users see production features via bugs, shared accounts, or network effects
Social/marketplace products have inherent contamination from holdout users seeing production content
Reshuffling breaks continuity: users moving from holdout suddenly see years of features
📌 Interview Tips
1When detecting bias: compare pre-holdout characteristics between groups before trusting comparisons
2For contamination: audit feature exposure logs to verify holdout users actually see holdout experience
← Back to Holdout Groups & Long-term Impact Overview