A/B Testing & ExperimentationExperiment Design (Randomization, Stratification, Power Analysis)Hard⏱️ ~3 min

What Are Sample Ratio Mismatch and Identity Churn Failures?

Definition
Sample Ratio Mismatch (SRM) occurs when observed user counts in control vs treatment deviate from intended allocation. Even 0.5-1% imbalance signals potential bugs that invalidate results.

Why SRM Matters

For a 50/50 split with 100,000 users, you expect roughly 50,000 each. Getting 51,000/49,000 (2% deviation) is highly unlikely by random chance (p < 0.001). This signals systematic bias: bucketing bugs, logging drops, or differential eligibility. Results from SRM experiments cannot be trusted.

Common causes: treatment crashes more often (dropping users from logs), treatment loads slower (users abandon before logging), redirect-based treatments drop users who dont follow redirects, bot traffic is unevenly distributed.

Detection

Run chi-squared test on observed vs expected counts. Alert if p < 0.001 (strong SRM) or p < 0.01 (concerning SRM). Check SRM daily during experiment, not just at end. SRM appearing mid-experiment indicates a deployment or logging change.

⚠️ Key Trade-off: You cannot fix SRM by reweighting. If treatment drops 2% of users, those dropped users are systematically different (slower devices, less patience). No statistical adjustment recovers unbiased estimates.

Identity Churn

Users who clear cookies, switch devices, or reinstall apps may get reassigned to different variants. This appears as SRM plus contaminated within-user comparisons. Track identity stability metrics and exclude high-churn users from analysis.

💡 Key Takeaways
Even 0.5-1% sample ratio deviation signals systematic bias; results cannot be trusted
Common causes: treatment crashes, slow loading, redirect drops, uneven bot traffic
Run chi-squared test daily; alert at p < 0.001 (strong SRM) or p < 0.01 (concerning)
Cannot fix SRM by reweighting - dropped users are systematically different
📌 Interview Tips
1When asked about SRM causes: list treatment crashes, slow loading, redirect drops, bot traffic
2For detection: describe chi-squared test on observed vs expected with daily monitoring
← Back to Experiment Design (Randomization, Stratification, Power Analysis) Overview
What Are Sample Ratio Mismatch and Identity Churn Failures? | Experiment Design (Randomization, Stratification, Power Analysis) - System Overflow