Production Implementation and Scale Considerations
LATENCY BUDGET
Consider a search service with 100ms p50 latency and 150ms p99. Adding interleaving must not blow the latency budget. The merge algorithm itself is cheap: O(K) with under 1ms for typical K=10-50. The real cost is running two rankers. If your ranker takes 30ms, dual inference adds 30ms (or 60ms if sequential). Mitigations: (1) Run rankers in parallel. (2) Cache shared features (user embeddings, item metadata) in Redis or local memory. (3) Use a fast candidate generator and only interleave the reranking stage.
TRAFFIC SAMPLING
Running interleaving on 100% of traffic doubles infrastructure cost. Instead, sample 10-20% of traffic. At 10,000 QPS, 10% sampling gives 1,000 QPS for interleaving, producing 400-2,000 competitive sessions per day, enough for statistical significance in 2-5 days. Use deterministic hashing (e.g., hash of user ID mod 100 < 10) for consistent user bucketing.
LOGGING AND MONITORING
Every interleaved request logs: query ID, item IDs with positions, team assignments, neutral flags, coin flip seeds, and all engagement events (clicks, time spent, conversions). Build a streaming pipeline to compute running preference margins every 5-10 minutes. Alert when: (1) Competitive coverage drops below 30%. (2) First position balance drifts beyond 2%. (3) Latency p99 regresses more than 5%.
PARALLEL EXPERIMENTS
Large teams run 10-20 interleaving experiments simultaneously. Each experiment targets a different query segment (e.g., navigational vs informational queries) or feature area (e.g., personalization vs query understanding). Use query level randomization with deterministic hashing so the same query consistently enters the same experiment. Log experiment IDs for downstream filtering.