ML Infrastructure & MLOpsAutomated Rollback & Canary AnalysisMedium⏱️ ~3 min

Traffic Routing and Shadow Mode for ML Canaries

Traffic routing determines which requests hit the canary versus stable baseline. Percentage based routing is the default: 10 percent of requests go to canary, 90 percent to stable, using a load balancer or service mesh weighted routing. Sticky routing is critical for ML systems to avoid inconsistent user experiences. If the same user hits canary then stable then canary again, they see different predictions or rankings, which can confuse or frustrate them. Sticky routing uses a hash of user identifier or session cookie to deterministically assign each user to canary or stable for the entire rollout duration. Header or cookie based routing enables controlled A/B tests. Internal users or beta cohorts get a canary header, everyone else goes to stable. This is useful for dogfooding new models with employees before exposing external users. Cohort based routing splits by geography, device type, or user segment. For example, you might canary a mobile specific model only on iOS devices in the US, keeping Android and other regions on stable. This reduces blast radius for targeted changes and allows segment specific metric collection. Shadow mode (also called dark traffic or mirrored traffic) is a powerful pre canary stage for ML models. Primary requests go to stable, responses return to users immediately. Simultaneously, requests are duplicated and sent to the canary, but canary responses are discarded. This lets you measure inference latency, resource usage, output prediction distributions, and error rates under live load with zero user impact. Google and Uber use shadow mode to validate that a new model can handle production query per second (QPS) and that predictions do not have unexpected distribution shifts before moving to live canary. Shadow mode has limits. It cannot catch issues that depend on user feedback loops or state changes. A recommendation model in shadow cannot see if users actually click the new suggestions, and a ranking model cannot observe if changed ordering affects engagement. Write heavy services or services with side effects (sending notifications, charging payments) cannot safely use shadow mode because duplicated writes can corrupt state. For these cases, move directly to a small live canary with careful monitoring.
💡 Key Takeaways
Sticky routing using hash of user identifier ensures each user sees consistent version (canary or stable) for entire rollout, avoids confusing prediction or ranking changes mid session
Shadow mode duplicates requests to canary but discards responses, measures inference latency and prediction distributions under live load with zero user impact, used by Google and Uber for pre canary validation
Cohort based routing splits by geography, device, or user segment (iOS in US only), reduces blast radius for targeted changes and enables segment specific metric collection
Shadow mode cannot catch feedback loop issues or engagement impact since users never see canary predictions, also unsafe for write heavy services with side effects (duplicated writes corrupt state)
Header or cookie routing enables dogfooding with internal users or beta cohorts before external exposure, useful for controlled A/B tests with specific user groups
📌 Examples
Uber shadows new ETA prediction models to validate inference latency stays under 50 ms and prediction distributions match expected ranges under live QPS, then moves to 5 percent live canary to measure actual trip acceptance rates
Netflix uses sticky routing keyed by subscriber identifier to ensure each user sees consistent recommendations throughout a canary rollout, avoiding jarring changes if user refreshes and hits different model version
← Back to Automated Rollback & Canary Analysis Overview
Traffic Routing and Shadow Mode for ML Canaries | Automated Rollback & Canary Analysis - System Overflow