A/B Testing & ExperimentationRamp-up Strategies & Canary AnalysisMedium⏱️ ~3 min

Ramp Up Strategies: Traffic Shaping and Cohort Assignment

Ramp up strategies control how production traffic flows to a new model version through progressive exposure: 0.5%, 1%, 5%, 25%, 50%, then 100%. The implementation requires deterministic user assignment that remains stable across requests. Use consistent hashing on a stable identifier (user ID, device ID, session ID) to compute a bucket number from 0 to 9,999. A user with ID hashing to bucket 47 stays in the 1% cohort (buckets 0 to 99) throughout the entire canary period. For a ranking service handling 80,000 requests per second at peak, starting at 1% means the canary serves 800 requests per second. If the new model uses 1.2 gigabytes RAM per replica versus 1.0 gigabyte baseline, and you run 50 replicas, the canary requires 10 additional gigabytes capacity during the ramp. Running both versions in parallel at 25% canary allocation can add 5 to 10% total infrastructure cost during the transition window. Stratified sampling prevents biased cohorts. If you sample purely on user ID, you might get 80% mobile users in the canary when the population is 60% mobile. This skews latency measurements because mobile networks have higher Round Trip Time (RTT) variance. Instead, compute separate hash spaces per stratum: hash within each (region, device type, user tenure) combination and allocate buckets proportionally. Meta style experimentation uses multi dimensional stratification to ensure canary cohorts match the population distribution across device, network, and user activity segments. For uncontrolled clients like mobile apps where version rollout is gradual, use capability probing instead of version checks. The client sends a compact feature flag set indicating what it supports ("supports_header_v2": true, "vector_index": "dense_128"). The server routes model selection based on actual capabilities. A new model requiring a feature only in app version 2.5+ can avoid serving users on version 2.4, preventing 2% feature null rate that would contaminate evaluation.
💡 Key Takeaways
Consistent hashing on stable user ID maps to bucket 0 to 9,999: bucket 47 stays in 1% cohort (0 to 99) for entire canary duration
At 80K requests per second peak, 1% canary serves 800 RPS, 25% serves 20K RPS with 5 to 10% added infrastructure cost during parallel operation
Stratified sampling by region, device, user tenure prevents bias: pure user ID sampling can yield 80% mobile when population is 60% mobile
Capability probing for mobile clients: server routes based on actual feature support flags, avoiding 2% feature null rate from old app versions
Full ramp from 1% to 100% typically takes 24 to 48 hours with gates between steps validating latency, error rate, and business metrics
📌 Examples
Consistent hashing: hash(user_id) % 10000 → user 123456 maps to bucket 6456, assigned to canary when threshold >= 6456 (top 35%)
Stratified allocation: Separately hash within (US, iOS, power_user), (EU, Android, new_user) to maintain population proportions in canary
Capability probe: Client sends {supports_dense_embeddings: true, app_version: 2.5}, server enables new ranker only for dense embedding clients
← Back to Ramp-up Strategies & Canary Analysis Overview
Ramp Up Strategies: Traffic Shaping and Cohort Assignment | Ramp-up Strategies & Canary Analysis - System Overflow