A/B Testing & ExperimentationStatistical Significance & Confidence IntervalsEasy⏱️ ~2 min

Statistical Significance: Understanding P-Values and Type I/II Errors

Definition
Statistical significance quantifies whether an observed difference is likely due to a real effect or just random noise. A p-value of 0.05 means there is a 5% probability of seeing a difference this large if no real effect exists.

TYPE I AND TYPE II ERRORS

Type I error (false positive): You declare a winner when there is no real difference. The alpha level (typically 0.05) controls this: 5% chance of false positive.
Type II error (false negative): You fail to detect a real difference. Power (typically 80%) controls this: 80% chance of detecting a true effect, 20% chance of missing it.

SAMPLE SIZE REQUIREMENTS

Sample size scales with the inverse square of effect size. To detect a 2% lift, you need 4x more users than detecting a 4% lift. Concrete example: detecting a 5% relative CTR lift from a 2.0% baseline requires about 1.6 million users per arm with alpha=0.05 and 80% power. Rare events like 0.05% purchase rates need 30 million+ users per arm.

⚠️ Key Trade-off: Statistical significance does not imply business value. With 50 million users, you can detect tiny meaningless effects as significant. Always pair statistical tests with business thresholds.

PRACTICAL IMPLICATIONS

High traffic systems can detect very small effects (0.1%) in hours. Low conversion events (purchases, subscriptions) need weeks. Plan experiment duration based on your traffic and minimum detectable effect, not arbitrary timelines.

💡 Key Takeaways
Alpha (0.05) controls false positive rate; Power (80%) controls false negative rate
Sample size scales with inverse square of effect: 2% lift needs 4x more users than 4% lift
Detecting 5% relative CTR lift from 2% baseline requires ~1.6M users per arm with 80% power
Statistical significance does not imply business value; always pair with business thresholds
📌 Interview Tips
1When explaining statistical testing, define both Type I (false positive) and Type II (false negative) errors with concrete alpha/power values
2Use the inverse square relationship: 4x sample size for half the effect size
3Emphasize the business threshold: large systems can detect meaningless effects as significant
← Back to Statistical Significance & Confidence Intervals Overview