A/B Testing & ExperimentationRamp-up Strategies & Canary AnalysisMedium⏱️ ~3 min

Ramp Up Strategies: Traffic Shaping and Cohort Assignment

CONSISTENT USER ASSIGNMENT

Users must stay in the same cohort (control or canary) throughout the experiment. If a user flips between versions mid-session, you cannot attribute behavior to either. Use consistent hashing: hash(user_id) mod 10000 maps to buckets 0-9999. At 5% canary, buckets 0-499 receive the new version. User 123456 maps to bucket 6456, so they stay in control until you ramp past 35%.

STRATIFIED SAMPLING

Pure user ID hashing can create biased cohorts. If your hash function happens to put 80% mobile users in the canary when the population is 60% mobile, your metrics are skewed. Stratified sampling hashes within segments: mobile users get buckets 0-5999, desktop 6000-9999. Then apply the 5% threshold within each segment to maintain population proportions.

CAPABILITY PROBING

Not all clients support new features. If your new model requires dense embeddings but 10% of users have old app versions that do not send them, routing those users to the canary causes 10% feature null rate and potential crashes. Capability probing: client sends {supports_dense_embeddings: true, app_version: 2.5}, server routes only capable clients to the canary.

⚠️ Key Trade-off: Capability probing reduces sample size and skews toward newer clients. Accept the trade-off or ensure backward compatibility.

INFRASTRUCTURE COST

At 80k QPS peak, 25% canary means running both versions: control handles 60k QPS, canary handles 20k QPS. This adds 5-10% extra compute during the parallel operation period. Budget for this overhead when planning rollout schedules.

💡 Key Takeaways
Consistent hashing: hash(user_id) mod 10000 ensures users stay in same cohort throughout experiment
Stratified sampling: hash within segments (mobile, desktop) to maintain population proportions and avoid bias
Capability probing: route only clients with required features to canary, avoiding null rates and crashes
Infrastructure cost: 25% canary at 80k QPS adds 5-10% extra compute during parallel operation
📌 Interview Tips
1Explain consistent hashing with concrete example: user 123456 hashes to bucket 6456, stays in control until 35% ramp
2Describe stratified sampling to prevent bias: mobile (60%) gets buckets 0-5999, desktop (40%) 6000-9999
3Mention capability probing for backward compatibility: only route clients supporting new features to canary
← Back to Ramp-up Strategies & Canary Analysis Overview