Model Evolution and Dual Indexing
THE MODEL UPDATE PROBLEM
Embedding models improve over time. A newer model might produce better quality embeddings, improving search relevance. But switching models means every existing vector is now invalid—embeddings from different models are incompatible. You cannot compare a vector from model v1 against a vector from model v2.
Full re-embedding is expensive. For 100M items, if embedding takes 10ms per item, re-embedding takes ~12 days of continuous compute. During this time, you need the old index serving traffic while building the new one.
DUAL INDEX STRATEGY
Shadow index: Build new index using new model in background while old index serves traffic. Once complete, run quality validation (offline A/B test or shadow scoring). If quality improves, switch traffic to new index.
Traffic cut-over: Start with 1% traffic to new index. Monitor latency, errors, and quality metrics. Gradually increase to 100% over hours or days. This catches regressions before full rollout.
Rollback plan: Keep old index available for 1-2 weeks after switch. If quality issues emerge, instant rollback by routing traffic back to old index.
QUERY-TIME MODEL MIGRATION
Alternative approach: keep both indexes running permanently. At query time, embed the query with both models, search both indexes, and merge results. This avoids atomic cutover but doubles compute and storage costs.
When to use: if embedding quality is highly item-dependent (some items work better with model v1, others with v2), merging results from both may outperform either alone. Otherwise, dual index overhead is rarely worth it.