A/B Testing & ExperimentationRamp-up Strategies & Canary AnalysisMedium⏱️ ~3 min

Canary Metrics: System, Product, and Data Quality Signals

SYSTEM METRICS

System metrics catch infrastructure problems immediately. Key signals: P95 latency delta under 5ms, P99 under 250ms total, error rate delta under 0.05% absolute. Use 5-15 minute trailing windows. If canary P99 spikes to 320ms for 10+ minutes when baseline is 200ms, trigger automatic rollback. These metrics provide fast feedback (minutes) with high confidence.

PRODUCT METRICS

Product metrics catch model quality problems but require more time. CTR, conversion rate, and engagement need hours of data to reach statistical significance. At 5% traffic (28.8M requests over 24 hours), you can detect 0.3% relative CTR change with 80% power. Use CUPED (Controlled-experiment Using Pre-Experiment Data) to reduce variance by 30%: adjust post-period measurements by pre-period differences between cohorts.

DATA QUALITY METRICS

Data quality metrics catch feature pipeline problems. Track: feature null rate (target under 0.5%), out-of-range values, distribution drift using KL divergence between canary and baseline. If baseline null rate is 0.3% and canary is 0.8%, that 0.5% delta indicates a missing feature dependency in the canary environment.

💡 Key Insight: Two-layer decision system: Layer 1 auto-rolls back on guardrails (error rate, P99, nulls). Layer 2 runs statistical tests on product metrics with multiple comparison correction.

COMPOSITE SCORING

Combine metrics into a single gate decision: weight categories (40% reliability, 50% product impact, 10% data quality), normalize to 0-100 scale. Pass threshold might be 70+. This simplifies the "should we ramp?" decision into a single number with clear thresholds.

💡 Key Takeaways
System metrics: P95 latency delta <5ms, error rate delta <0.05%, 5-15 minute windows for fast feedback
Product metrics: CTR/conversion need hours of data; CUPED reduces variance 30% by adjusting for pre-period differences
Data quality: feature null rate <0.5%, distribution drift via KL divergence catches pipeline bugs
Two-layer decisions: Layer 1 auto-rollback on guardrails, Layer 2 statistical tests on product metrics
📌 Interview Tips
1Explain the three metric categories with concrete thresholds: P99 <250ms, error rate <0.05%, null rate <0.5%
2Describe CUPED: if canary pre-period CTR is 3.1% vs baseline 3.2%, adjust post-period by that 0.1% difference
3Mention composite scoring: 40% reliability + 50% product + 10% data quality = single go/no-go number
← Back to Ramp-up Strategies & Canary Analysis Overview