How Does Stratification Reduce Variance in Experiments?
Why Stratify
Simple randomization can accidentally create imbalanced groups. In a 1000-user experiment, random chance might put 60% of iOS users in treatment. If iOS converts 2x higher than Android, this inflates your treatment effect estimate. Stratification guarantees equal iOS/Android splits in both arms.
Variance reduction is proportional to how predictive the stratification variable is. If platform explains 20% of outcome variance, stratifying reduces experiment variance by ~20%, meaning fewer users needed for same statistical power.
Choosing Variables
Good stratification variables are: known before randomization, predictive of outcome, and available for all users. Common choices: platform, country, user tenure, baseline engagement. Limit to 3-5 dimensions with 2-4 levels each - over-stratification fragments your sample.
Implementation
Include stratum_id in hash: hash(user_id + experiment_id + stratum_id) mod 100. This ensures 50/50 splits within each stratum while maintaining sticky bucketing. Analyze by computing within-stratum effects, then combining with weighted averages.