ML Infrastructure & MLOpsAutomated Rollback & Canary AnalysisMedium⏱️ ~3 min

Traffic Routing and Shadow Mode for ML Canaries

Core Concept
Traffic routing determines which requests hit canary vs baseline. Shadow mode (dark traffic) duplicates requests to canary but discards responses—zero user impact for initial validation.

PERCENTAGE ROUTING

Default approach: 10% to canary, 90% to stable via load balancer or service mesh weighted routing.

STICKY ROUTING

Critical for ML: same user hitting canary→stable→canary sees different predictions, causing confusion. Hash user ID or session cookie to deterministically assign each user to canary or stable for entire rollout duration.

COHORT ROUTING

Header/cookie based: internal users get canary header for dogfooding before external exposure. Cohort based: split by geography, device, segment. Example: canary mobile model only on iOS/US, keep Android and other regions on stable. Reduces blast radius for targeted changes.

💡 Shadow Mode: Primary requests go to stable (responses to users). Duplicated requests sent to canary, responses discarded. Measure latency, resource usage, prediction distributions under live load before live canary.

SHADOW MODE LIMITS

Cannot catch issues depending on user feedback loops or state changes. Recommendation model in shadow cannot see if users click new suggestions. Write-heavy services or those with side effects (notifications, payments) cannot safely use shadow—duplicated writes corrupt state. For these: move directly to small live canary with careful monitoring.

⚠️ Pattern: Use shadow mode to validate QPS handling and prediction distribution stability, then move to live canary for engagement metrics.
💡 Key Takeaways
Sticky routing using hash of user identifier ensures each user sees consistent version (canary or stable) for entire rollout, avoids confusing prediction or ranking changes mid session
Shadow mode duplicates requests to canary but discards responses, measures inference latency and prediction distributions under live load with zero user impact, used by Google and Uber for pre canary validation
Cohort based routing splits by geography, device, or user segment (iOS in US only), reduces blast radius for targeted changes and enables segment specific metric collection
Shadow mode cannot catch feedback loop issues or engagement impact since users never see canary predictions, also unsafe for write heavy services with side effects (duplicated writes corrupt state)
Header or cookie routing enables dogfooding with internal users or beta cohorts before external exposure, useful for controlled A/B tests with specific user groups
📌 Interview Tips
1Uber shadows new ETA prediction models to validate inference latency stays under 50 ms and prediction distributions match expected ranges under live QPS, then moves to 5 percent live canary to measure actual trip acceptance rates
2Netflix uses sticky routing keyed by subscriber identifier to ensure each user sees consistent recommendations throughout a canary rollout, avoiding jarring changes if user refreshes and hits different model version
← Back to Automated Rollback & Canary Analysis Overview
Traffic Routing and Shadow Mode for ML Canaries | Automated Rollback & Canary Analysis - System Overflow