ML-Powered Search & RankingDense Retrieval (BERT-based Embeddings)Hard⏱️ ~3 min

Dense Retrieval Failure Modes and Mitigation Strategies

Out-of-Distribution Queries

Dense retrievers generalize poorly to query types unseen during training. A model trained on natural questions fails on code snippets, product IDs, or specialized jargon. Symptoms: low recall on specific query segments, user complaints about obvious misses. Detection: segment queries by type, measure recall per segment. Mitigation: include diverse query types in training data, fall back to sparse retrieval for detected OOD queries, or use hybrid retrieval by default.

Embedding Drift

When you update the encoder model, old document embeddings become incompatible with new query embeddings. Even small model changes shift the entire vector space. Symptoms: recall drops after model update despite better offline metrics. Prevention: always re-encode all documents when updating the encoder. This is expensive but absolutely necessary. Track embedding version with documents; reject queries against mismatched versions or maintain multiple index versions during transitions.

False Semantic Similarity

Embeddings place superficially similar but semantically different texts close together. "How to kill a process" and "how to kill a person" might have high similarity. "Apple iPhone" and "Apple fruit" might cluster together. The model learned surface patterns, not true meaning. Mitigation: include contrastive pairs in training that are lexically similar but semantically different; use domain-specific fine-tuning to separate false positives.

💡 Monitoring: Track per-query-type recall weekly. Sudden drops indicate model degradation or distribution shift. Always A/B test model changes against production baseline before full rollout.
💡 Key Takeaways
OOD queries: models fail on query types not in training (code, IDs, jargon); segment and measure recall
Mitigation for OOD: include diverse training data, fall back to sparse, or use hybrid default
Embedding drift: model updates invalidate old embeddings; always re-encode all documents
False similarity: lexically similar but semantically different texts cluster (Apple iPhone vs fruit)
Monitor per-query-type recall weekly; A/B test model changes before full rollout
📌 Interview Tips
1Explain embedding drift problem when discussing model updates - shows production awareness
2Describe OOD failure with specific examples (code snippets, product IDs)
3Mention false semantic similarity problem with Apple example for nuanced understanding
← Back to Dense Retrieval (BERT-based Embeddings) Overview
Dense Retrieval Failure Modes and Mitigation Strategies | Dense Retrieval (BERT-based Embeddings) - System Overflow