What Are Holdout Groups and Why Do They Matter?
Why Holdouts Matter
Individual A/B tests measure short-term effects. But many small changes compound over months. A feature that lifts engagement 1% might reduce retention 0.5% - invisible in a 2-week test, devastating over a year. Holdouts reveal this cumulative impact.
Without holdouts, you cannot measure total improvement from all experiments. Each experiment compares against the current state, but the current state keeps changing. Holdouts freeze a baseline for long-term comparison.
Holdout Types
Universal holdout: excluded from ALL new features. Measures total experimentation value. Feature holdout: excluded from specific feature area (e.g., all recommendation changes). Measures area-specific value. Time-limited holdout: held for specific period (6-12 months), then refreshed.
Long-Term Measurement
Compare holdout to production on metrics like 90-day retention, lifetime value, annual revenue. These long-latency metrics are impossible to measure in standard 2-4 week experiments.