ML Infrastructure & MLOpsShadow Mode DeploymentHard⏱️ ~2 min

Implementing Shadow Mode: Mirroring, Isolation, and Promotion Criteria

Shadow Implementation: A complete shadow deployment requires traffic mirroring infrastructure, isolation mechanisms to prevent shadow failures from affecting production, and clear promotion criteria that define when shadow validation is sufficient to proceed.

Traffic Mirroring Implementation

At the load balancer or service mesh level, duplicate incoming requests to the shadow endpoint. Implementation options: Envoy proxy with mirror policy (duplicates percentage of traffic), custom middleware that forwards requests asynchronously, or Kafka-based replay where requests are logged and shadow consumes from the log. Key requirements: shadow receives identical inputs (same features, same timestamps), mirroring does not add latency to production path, and failed shadow requests do not retry (shadow failures should be logged but not retried).

Isolation Mechanisms

Deploy shadow on separate infrastructure with resource quotas. Container-level: separate pods with CPU and memory limits. Network-level: shadow cannot reach production databases or external services (use mocks or read replicas). Failure-level: circuit breaker disables shadow if error rate exceeds threshold, preventing cascade failures. Monitoring-level: separate dashboards and alerts for shadow, so shadow issues do not pollute production monitoring. The goal: shadow can fail completely without any production impact.

Promotion Criteria

Define explicit criteria for promoting shadow to production. Quantitative criteria: prediction quality within X% of production (or better), latency p99 under Y milliseconds, error rate under Z%. Duration criteria: metrics stable for N days. Coverage criteria: processed representative samples of all user segments and request types. Document criteria before starting shadow—avoid moving goalposts. If shadow fails criteria, investigate root cause before re-running, do not just extend duration hoping issues resolve themselves.

Automation: Build promotion as a pipeline: shadow deploys automatically, metrics collection runs automatically, promotion decision is automated based on pre-defined criteria. Human approval only for exceptions.

💡 Key Takeaways
Traffic mirroring via service mesh, middleware, or Kafka-based replay
Isolation: separate pods, network restrictions, circuit breakers, separate monitoring
Define promotion criteria before starting: quality, latency, error rate, duration
📌 Interview Tips
1Envoy proxy mirror policy duplicates percentage of traffic to shadow
2Circuit breaker disables shadow if error rate exceeds threshold
← Back to Shadow Mode Deployment Overview
Implementing Shadow Mode: Mirroring, Isolation, and Promotion Criteria | Shadow Mode Deployment - System Overflow