Learn→A/B Testing & Experimentation→Ramp-up Strategies & Canary Analysis→2 of 6

A/B Testing & Experimentation • Ramp-up Strategies & Canary AnalysisMedium⏱️ ~3 min

Ramp Up Strategies: Traffic Shaping and Cohort Assignment

CONSISTENT USER ASSIGNMENT
Users must stay in the same cohort (control or canary) throughout the experiment. If a user flips between versions mid-session, you cannot attribute behavior to either. Use consistent hashing: hash(user_id) mod 10000 maps to buckets 0-9999. At 5% canary, buckets 0-499 receive the new version. User 123456 maps to bucket 6456, so they stay in control until you ramp past 35%.
STRATIFIED SAMPLING
Pure user ID hashing can create biased cohorts. If your hash function happens to put 80% mobile users in the canary when the population is 60% mobile, your metrics are skewed. Stratified sampling hashes within segments: mobile users get buckets 0-5999, desktop 6000-9999. Then apply the 5% threshold within each segment to maintain population proportions.
CAPABILITY PROBING
Not all clients support new features. If your new model requires dense embeddings but 10% of users have old app versions that do not send them, routing those users to the canary causes 10% feature null rate and potential crashes. Capability probing: client sends {supports_dense_embeddings: true, app_version: 2.5}, server routes only capable clients to the canary.
⚠️ Key Trade-off: Capability probing reduces sample size and skews toward newer clients. Accept the trade-off or ensure backward compatibility.
INFRASTRUCTURE COST
At 80k QPS peak, 25% canary means running both versions: control handles 60k QPS, canary handles 20k QPS. This adds 5-10% extra compute during the parallel operation period. Budget for this overhead when planning rollout schedules.

💡 Key Takeaways

✓Consistent hashing: hash(user_id) mod 10000 ensures users stay in same cohort throughout experiment

✓Stratified sampling: hash within segments (mobile, desktop) to maintain population proportions and avoid bias

✓Capability probing: route only clients with required features to canary, avoiding null rates and crashes

✓Infrastructure cost: 25% canary at 80k QPS adds 5-10% extra compute during parallel operation

📌 Interview Tips

1Explain consistent hashing with concrete example: user 123456 hashes to bucket 6456, stays in control until 35% ramp

2Describe stratified sampling to prevent bias: mobile (60%) gets buckets 0-5999, desktop (40%) 6000-9999

3Mention capability probing for backward compatibility: only route clients supporting new features to canary

← Back to Ramp-up Strategies & Canary Analysis Overview