A/B Testing & ExperimentationInterleaving for Ranking ModelsMedium⏱️ ~2 min

Team Draft Interleaving Algorithm

THE TEAM DRAFT PROCESS

Team Draft constructs the blended result list through a round based selection process inspired by sports team drafts. In each round, both models propose their highest ranked item that has not yet been added to the blended list. A coin flip determines which model places its item first in that round. This random alternation eliminates systematic position bias.

HANDLING DUPLICATES

When both models propose the same item (which happens frequently since top results often overlap), the item is included once and marked as neutral. Neutral items receive no team credit because both models agreed on them. This focuses attribution on the items where models actually disagreed, increasing statistical power.

For example, if both models rank the same item as their #1, it goes into slot 1 as neutral. The next round, both propose their #2 ranked items, and the coin flip decides which goes to slot 2 versus slot 3.

ALGORITHM COMPLEXITY

The merge algorithm runs in O(K) time where K is the number of items in the result list (typically 10-50). It uses pointer operations rather than copying items, adding under 1 millisecond to request latency. The real computational cost is running both rankers, which can double inference time. This is mitigated by caching shared features like user embeddings and item metadata.

⚠️ Key Trade-off: Dual ranker inference typically adds 10-30ms latency. Feature caching and parallel execution can reduce this to under 10ms overhead.

REQUIRED LOGGING

Every interleaved request must log: query ID, item ID, slot position, team assignment (A, B, or neutral), coin flip seed (for reproducibility), and all subsequent engagement signals (clicks, time spent, conversions). This data powers the statistical analysis that determines the winner.

💡 Key Takeaways
Each round: both models propose their top unassigned item, coin flip decides placement order, eliminating systematic position bias
Items proposed by both models are marked neutral and excluded from attribution, focusing statistical power on actual disagreements
Merge algorithm is O(K) with under 1ms latency; the real cost is dual ranker inference (10-30ms added)
Logging must capture query ID, item ID, slot position, team label, neutral flag, coin flip seed, and all engagement signals
📌 Interview Tips
1When asked about interleaving mechanics, walk through a concrete example: Model A ranks [X,Y,Z], Model B ranks [X,Z,W], explain how Team Draft merges them with neutral handling
2Mention the latency trade-off: merge is cheap (<1ms) but dual inference is expensive, mitigated by feature caching
3If discussing implementation, emphasize that reproducibility requires logging coin flip seeds so you can replay the exact merge
← Back to Interleaving for Ranking Models Overview
Team Draft Interleaving Algorithm | Interleaving for Ranking Models - System Overflow