Learn→A/B Testing & Experimentation→Experiment Design (Randomization, Stratification, Power Analysis)→5 of 6
A/B Testing & Experimentation • Experiment Design (Randomization, Stratification, Power Analysis)Hard⏱️ ~3 min
What Are Sample Ratio Mismatch and Identity Churn Failures?
Sample ratio mismatch (SRM) occurs when the observed ratio of users in control versus treatment deviates from the intended allocation ratio. For a 50/50 split, you expect roughly equal counts. Even a 0.5 to 1 percent imbalance signals potential bucketing bugs, logging drops, or eligibility logic errors. SRM indicates the randomization process is broken, which invalidates causal inference. Production systems run real time chi-square tests on variant counts and auto-pause experiments when the p-value drops below 0.001.
Common SRM causes include differential logging failure rates between variants (treatment logs crash but control does not), eligibility checks that filter differently by variant, or hash collisions in assignment logic. For example, if the treatment code path has a higher error rate that prevents exposure logging, treatment will appear under-counted. The fix requires debugging the assignment and logging pipeline, not adjusting the analysis. Never proceed with an experiment that shows SRM; the estimates are biased and unreliable.
Identity churn happens when user IDs change across sessions, devices, or platforms without proper cross device linking. If user 123 logs in on mobile with ID 123 and later on desktop gets a new session ID 456, the assignment service sees two different units and may assign different variants. This breaks stickiness, dilutes treatment effects, and violates the stable unit treatment value assumption. Observed effects can drop by 10 to 20 percent when identity churn exceeds 5 percent.
Production systems maintain cross device identity graphs using deterministic links (login on both devices) or probabilistic models (IP, user agent, behavioral fingerprinting). The assignment service uses the most stable identifier available, such as a canonical user ID that persists across devices. Monitoring tracks identity churn rate by measuring how often the same canonical user appears with multiple session IDs. High churn triggers investigation into authentication flows, cookie persistence, or identity linking models. Without stable identity, experiments on multi-device users are noisy and underpowered.
💡 Key Takeaways
•Sample ratio mismatch of even 0.5 to 1 percent indicates broken randomization or logging; never proceed with analysis when SRM is detected
•Real time chi-square tests monitor variant counts and auto-pause experiments when p-value falls below 0.001 to prevent invalid causal estimates
•Identity churn above 5 percent (same user receiving different IDs across devices) can reduce observed treatment effects by 10 to 20 percent
•Cross device identity graphs use deterministic links (login events) or probabilistic models (IP, user agent, behavior) to maintain stable canonical IDs
•Common SRM causes include differential error rates in treatment code, eligibility logic filtering variants differently, or hash collisions in assignment
•Monitoring identity churn rate tracks how often a canonical user appears with multiple session IDs, triggering investigation when churn spikes
📌 Examples
Meta detected SRM in a newsfeed ranking experiment where treatment logging had 1.2 percent higher failure rate due to payload size limits, causing under-count
Netflix found identity churn rate of 8 percent across TV and mobile devices reduced homepage CTR effect size by 15 percent until cross device linking improved
Airbnb runs automated SRM checks every 5 minutes during experiment ramps, catching bucketing bugs within 15 minutes and preventing bad launches