What Are Sample Ratio Mismatch and Identity Churn Failures?
Why SRM Matters
For a 50/50 split with 100,000 users, you expect roughly 50,000 each. Getting 51,000/49,000 (2% deviation) is highly unlikely by random chance (p < 0.001). This signals systematic bias: bucketing bugs, logging drops, or differential eligibility. Results from SRM experiments cannot be trusted.
Common causes: treatment crashes more often (dropping users from logs), treatment loads slower (users abandon before logging), redirect-based treatments drop users who dont follow redirects, bot traffic is unevenly distributed.
Detection
Run chi-squared test on observed vs expected counts. Alert if p < 0.001 (strong SRM) or p < 0.01 (concerning SRM). Check SRM daily during experiment, not just at end. SRM appearing mid-experiment indicates a deployment or logging change.
Identity Churn
Users who clear cookies, switch devices, or reinstall apps may get reassigned to different variants. This appears as SRM plus contaminated within-user comparisons. Track identity stability metrics and exclude high-churn users from analysis.