Trade-offs: Statistical Power, Operational Complexity, and Cost

Core Concept
Holdout trade-offs balance measurement power against the cost of withholding improvements from users. Every holdout user misses shipped features and represents lost value.
Statistical Power
Larger holdout = more power to detect differences. With 5% holdout (50K users on 1M total), you can detect 5% relative differences in long-term metrics. With 1% holdout (10K users), you need 10%+ differences to detect reliably. Power depends on holdout size, metric variance, and observation period.
High-variance metrics (revenue, LTV) need larger holdouts. Low-variance metrics (retention) can work with smaller ones. Calculate required holdout size based on your most important long-term metric.
Opportunity Cost
If shipped features improve revenue 10%, a 5% holdout loses 0.5% of total revenue (5% × 10%). For $100M annual revenue, thats $500K/year. This cost must be justified by the value of long-term measurement and catching cumulative harm that would cost more if undetected.
⚠️ Key Trade-off: Holdout users see outdated experience. Over years, the gap widens dramatically. Ethics and user expectations may limit how long you can maintain holdouts without refresh.
Operational Complexity
Every feature must check holdout status and branch code paths. Support and ops must handle two product versions. Bug fixes must be applied to both paths. Testing must cover both paths. This complexity scales linearly with feature count and holdout duration.

💡 Key Takeaways

✓5% holdout can detect 5% relative differences; 1% holdout needs 10%+ differences

✓If features improve revenue 10%, 5% holdout loses 0.5% of total revenue as opportunity cost

✓Holdout users see increasingly outdated experience over years; ethics may limit duration

✓Every feature must branch on holdout status; complexity scales with feature count

📌 Interview Tips

1When calculating cost: 5% holdout × 10% improvement = 0.5% revenue loss ($500K on $100M)

2For power discussion: describe relationship between holdout size, metric variance, and detectable effect size

← Back to Holdout Groups & Long-term Impact Overview