A/B Testing & ExperimentationRamp-up Strategies & Canary AnalysisHard⏱️ ~3 min

Failure Modes: Biased Cohorts, Cold Start, and Feedback Loops

BIASED COHORTS

Small canary percentages can create biased samples. If 5% canary happens to skew toward power users (higher engagement), you see 0.5% CTR lift that vanishes at 100% when casual users dominate. Detection: Compare pre-period metrics between cohorts before starting the experiment. If canary pre-period CTR is 3.5% vs baseline 3.2%, rebalance strata. Prevention: Use stratified sampling by user segment.

COLD START LATENCY

New replicas spike P99 latency from 210ms to 400ms for the first 5 minutes while caches warm. This triggers false rollback even though the steady-state performance is fine. Mitigation: Pre-warm by replaying the last 60 minutes of requests at 10x speed (6 minutes replay time). Use 10-15 minute grace periods before evaluating latency metrics. Flag the cold start window in logs for exclusion from analysis.

⚠️ Key Trade-off: Grace periods delay problem detection. Balance cold start tolerance against fast rollback requirements.

FEEDBACK LOOPS AND TEMPORAL EFFECTS

Novelty effect: Users engage more with new UI in the first hour, then revert to baseline. Learning effect: Users initially struggle with changes, then adapt. Both bias short-window measurements. Mitigation: Use 24-hour evaluation windows and maintain parallel holdouts for retention metrics. Compare day-1, day-7, and day-30 cohort behavior to separate novelty from true improvement.

DEPENDENCY SATURATION

Canary model increases embedding service QPS by 30%. At 25% traffic, you hit the embedding service capacity limit (15k QPS), causing timeouts. The model gets blamed for latency when the real issue is downstream capacity. Prevention: Validate downstream service headroom before each ramp step.

💡 Key Takeaways
Biased cohorts: small samples skew toward power users; validate pre-period metrics match before starting
Cold start: new replicas spike P99 for 5 minutes; pre-warm caches and use 10-15 min grace periods
Feedback loops: novelty effect inflates first-hour metrics; use 24-hour windows and parallel holdouts
Dependency saturation: canary may hit downstream capacity limits; validate service headroom before each ramp
📌 Interview Tips
1Describe cold start mitigation: replay 60 minutes of requests at 10x speed to warm caches before live traffic
2Explain novelty vs learning effects: short windows are biased, use 24-hour evaluation windows
3Mention dependency saturation: model blamed for timeouts when real issue is downstream embedding service capacity
← Back to Ramp-up Strategies & Canary Analysis Overview