Learn→A/B Testing & Experimentation→Statistical Significance & Confidence Intervals→2 of 5

A/B Testing & Experimentation • Statistical Significance & Confidence IntervalsMedium⏱️ ~3 min

Confidence Intervals: Precision and Practical Interpretation

WHAT CONFIDENCE INTERVALS MEAN
A 95% confidence interval means the procedure captures the true value 95% of the time across repeated samples. It does NOT mean there is a 95% probability that this specific interval contains the true value. The interval is fixed; the true parameter is fixed; the 95% refers to the long run frequency of the procedure.
INTERVALS SHOW BOTH SIGNIFICANCE AND MAGNITUDE
Unlike p-values, confidence intervals convey both whether an effect exists and how large it might be. CTR lift of 0.10 percentage points with 95% CI [0.06, 0.14] tells you: (1) effect is significant (interval excludes zero), (2) effect is meaningful (lower bound 0.06 is still useful). If the CI were [0.01, 0.19], the effect is significant but might be negligibly small.
CALCULATING INTERVALS
For large samples, use z critical value: 1.96 for 95% CI. For small samples, use t critical value with n-1 degrees of freedom. Formula: CI = mean ± z × (std_dev / sqrt(n)). For proportions near 0% or 100%, Wilson or Agresti-Coull intervals have better coverage than normal approximation.
💡 Key Insight: Production decisions pair intervals with thresholds: ship if lower bound for engagement exceeds zero AND upper bound for latency stays under 10ms.
BOOTSTRAP FOR HEAVY TAILS
Heavy-tailed metrics like revenue or p95 latency violate normal assumptions. Bootstrap intervals resample the data thousands of times to estimate the sampling distribution empirically. This handles outliers gracefully but costs 10-100x more computation than parametric methods.

💡 Key Takeaways

✓95% CI means the procedure captures the true value 95% of the time, not a 95% probability for this specific interval

✓Intervals show significance AND magnitude: CI [0.06, 0.14] means effect is real and lower bound is meaningful

✓Use z=1.96 for large samples, t-distribution for small samples; Wilson intervals for proportions near 0 or 100%

✓Bootstrap intervals handle heavy tailed metrics but cost 10-100x more computation

📌 Interview Tips

1Explain the correct interpretation: the procedure works 95% of the time, not a probability statement about this interval

2Show how intervals inform decisions: ship if engagement lower bound > 0 AND latency upper bound < threshold

3Mention Wilson intervals for rare events (0.1% conversion) where normal approximation fails

← Back to Statistical Significance & Confidence Intervals Overview