Learn→A/B Testing & Experimentation→Ramp-up Strategies & Canary Analysis→5 of 6

A/B Testing & Experimentation • Ramp-up Strategies & Canary AnalysisHard⏱️ ~3 min

Failure Modes: Biased Cohorts, Cold Start, and Feedback Loops

BIASED COHORTS
Small canary percentages can create biased samples. If 5% canary happens to skew toward power users (higher engagement), you see 0.5% CTR lift that vanishes at 100% when casual users dominate. Detection: Compare pre-period metrics between cohorts before starting the experiment. If canary pre-period CTR is 3.5% vs baseline 3.2%, rebalance strata. Prevention: Use stratified sampling by user segment.
COLD START LATENCY
New replicas spike P99 latency from 210ms to 400ms for the first 5 minutes while caches warm. This triggers false rollback even though the steady-state performance is fine. Mitigation: Pre-warm by replaying the last 60 minutes of requests at 10x speed (6 minutes replay time). Use 10-15 minute grace periods before evaluating latency metrics. Flag the cold start window in logs for exclusion from analysis.
⚠️ Key Trade-off: Grace periods delay problem detection. Balance cold start tolerance against fast rollback requirements.
FEEDBACK LOOPS AND TEMPORAL EFFECTS
Novelty effect: Users engage more with new UI in the first hour, then revert to baseline. Learning effect: Users initially struggle with changes, then adapt. Both bias short-window measurements. Mitigation: Use 24-hour evaluation windows and maintain parallel holdouts for retention metrics. Compare day-1, day-7, and day-30 cohort behavior to separate novelty from true improvement.
DEPENDENCY SATURATION
Canary model increases embedding service QPS by 30%. At 25% traffic, you hit the embedding service capacity limit (15k QPS), causing timeouts. The model gets blamed for latency when the real issue is downstream capacity. Prevention: Validate downstream service headroom before each ramp step.

💡 Key Takeaways

✓Biased cohorts: small samples skew toward power users; validate pre-period metrics match before starting

✓Cold start: new replicas spike P99 for 5 minutes; pre-warm caches and use 10-15 min grace periods

✓Feedback loops: novelty effect inflates first-hour metrics; use 24-hour windows and parallel holdouts

✓Dependency saturation: canary may hit downstream capacity limits; validate service headroom before each ramp

📌 Interview Tips

1Describe cold start mitigation: replay 60 minutes of requests at 10x speed to warm caches before live traffic

2Explain novelty vs learning effects: short windows are biased, use 24-hour evaluation windows

3Mention dependency saturation: model blamed for timeouts when real issue is downstream embedding service capacity

← Back to Ramp-up Strategies & Canary Analysis Overview