A/B Testing & ExperimentationExperiment Design (Randomization, Stratification, Power Analysis)Easy⏱️ ~3 min

What is Randomization and Sticky Bucketing in Experiments?

Randomization in machine learning experiments assigns units to treatment or control variants using a deterministic hash function. Instead of flipping a coin for each request, systems compute a hash of a stable unit identifier (user ID, device ID, or session ID) combined with the experiment ID, then map the hash output to a variant. This approach ensures the same unit consistently receives the same variant across time and services, called sticky bucketing. The assignment must be idempotent and complete in under 2 to 5 milliseconds at the 99th percentile to support inline serving in low latency ranking or rendering paths. At Meta and Google, assignment services handle millions of queries per second with strict CPU budgets of less than 100 microseconds per request. The deterministic hash ensures that if a user logs in on mobile and later on desktop, they see the same variant, which prevents confusing experiences and maintains statistical validity. Unit selection matters enormously. User level randomization works for independent outcomes like UI changes or notification content. Session level randomization suits short lived features where carryover is minimal. For marketplace experiments where sellers and buyers interact, or social features where friends influence each other, user level randomization creates interference. In these cases, teams switch to geo cluster randomization or switchback designs that alternate treatment by time window (hourly or daily) to reduce contamination. Identity stability is critical. If user IDs change across sessions or devices without proper linking, the same person receives different variants, which dilutes treatment effects and violates the consistency assumption. Production systems maintain cross device identity graphs and use the most stable identifier available. When identity churn exceeds 5 percent, observed effect sizes can drop by 10 to 20 percent due to noise from inconsistent assignments.
💡 Key Takeaways
Deterministic hashing on unit ID plus experiment ID ensures sticky bucketing, so user 847291 always sees the same variant across devices and sessions
Assignment latency must stay under 2 to 5 milliseconds at p99 to support inline ranking calls at millions of queries per second
User level randomization maximizes power for independent outcomes, but geo cluster or switchback designs reduce interference in marketplace or social experiments
Identity churn above 5 percent (users getting new IDs) can dilute treatment effects by 10 to 20 percent due to inconsistent variant assignment
CPU budget for assignment is typically less than 100 microseconds per request at high scale platforms like Meta and Google
📌 Examples
Netflix uses deterministic hashing on user ID to assign homepage ranking experiments, ensuring a user sees the same treatment on TV, mobile, and web
Airbnb runs marketplace experiments with geo cluster randomization where entire cities are assigned to treatment to avoid supply side interference between hosts
DoorDash uses switchback designs for delivery time prediction experiments, alternating treatment hourly to prevent spillover between nearby orders
← Back to Experiment Design (Randomization, Stratification, Power Analysis) Overview