A/B Testing & ExperimentationExperiment Design (Randomization, Stratification, Power Analysis)Medium⏱️ ~3 min

What is Power Analysis and Why Does Sample Size Matter?

Definition
Power analysis calculates sample size needed to detect a given effect. It connects: sample size (N), minimum detectable effect (MDE), significance level (alpha, typically 5%), and power (typically 80%).

The Core Relationship

The formula is roughly: N ∝ variance × (z_alpha + z_beta)² / MDE². Halving the MDE quadruples required sample size. Increasing power from 80% to 90% adds ~30% more sample. These are expensive trade-offs.

For a conversion experiment: baseline 5%, MDE 10% relative lift (detecting 5.5% vs 5.0%), alpha 5%, power 80% requires ~31,000 users per arm, or 62,000 total. At 10,000 daily users, thats 6-7 days minimum.

Underpowered Experiments

Running underpowered experiments wastes resources. If true effect is 3% but your MDE is 5%, you have only 30-40% chance of detecting it. Worse, any significant result from underpowered experiments is likely inflated (winners curse).

⚠️ Key Trade-off: Larger MDE means faster experiments but misses small improvements. A 1% lift on 10M annual conversions at $100 AOV is $1M+ value. Missing it because you powered for 5% MDE is expensive.

Pre-Experiment Planning

Before launching, calculate: required sample for your MDE, daily traffic to experiment surface, expected runtime. If runtime exceeds 4-6 weeks, reconsider: accept larger MDE? Reduce metric variance? Use higher-traffic surface?

💡 Key Takeaways
Halving the MDE quadruples required sample size; power 80→90% adds ~30% more sample
Underpowered experiments suffer from winners curse - significant results are inflated estimates
High-variance metrics (revenue) need 4x+ sample vs low-variance metrics (conversion)
If runtime exceeds 4-6 weeks, reconsider MDE, variance reduction, or traffic surface
📌 Interview Tips
1When asked about sample size: give concrete example (62K users for 10% relative lift on 5% baseline conversion)
2For underpowered experiments: explain winners curse - significant results from underpowered tests are inflated
← Back to Experiment Design (Randomization, Stratification, Power Analysis) Overview