Learn→A/B Testing & Experimentation→Statistical Significance & Confidence Intervals→1 of 5

A/B Testing & Experimentation • Statistical Significance & Confidence IntervalsEasy⏱️ ~2 min

Statistical Significance: Understanding P-Values and Type I/II Errors

Definition
Statistical significance quantifies whether an observed difference is likely due to a real effect or just random noise. A p-value of 0.05 means there is a 5% probability of seeing a difference this large if no real effect exists.
TYPE I AND TYPE II ERRORS
Type I error (false positive): You declare a winner when there is no real difference. The alpha level (typically 0.05) controls this: 5% chance of false positive.
Type II error (false negative): You fail to detect a real difference. Power (typically 80%) controls this: 80% chance of detecting a true effect, 20% chance of missing it.
SAMPLE SIZE REQUIREMENTS
Sample size scales with the inverse square of effect size. To detect a 2% lift, you need 4x more users than detecting a 4% lift. Concrete example: detecting a 5% relative CTR lift from a 2.0% baseline requires about 1.6 million users per arm with alpha=0.05 and 80% power. Rare events like 0.05% purchase rates need 30 million+ users per arm.
⚠️ Key Trade-off: Statistical significance does not imply business value. With 50 million users, you can detect tiny meaningless effects as significant. Always pair statistical tests with business thresholds.
PRACTICAL IMPLICATIONS
High traffic systems can detect very small effects (0.1%) in hours. Low conversion events (purchases, subscriptions) need weeks. Plan experiment duration based on your traffic and minimum detectable effect, not arbitrary timelines.

💡 Key Takeaways

✓Alpha (0.05) controls false positive rate; Power (80%) controls false negative rate

✓Sample size scales with inverse square of effect: 2% lift needs 4x more users than 4% lift

✓Detecting 5% relative CTR lift from 2% baseline requires ~1.6M users per arm with 80% power

✓Statistical significance does not imply business value; always pair with business thresholds

📌 Interview Tips

1When explaining statistical testing, define both Type I (false positive) and Type II (false negative) errors with concrete alpha/power values

2Use the inverse square relationship: 4x sample size for half the effect size

3Emphasize the business threshold: large systems can detect meaningless effects as significant

← Back to Statistical Significance & Confidence Intervals Overview