Dense Retrieval Failure Modes and Mitigation Strategies
Out-of-Distribution Queries
Dense retrievers generalize poorly to query types unseen during training. A model trained on natural questions fails on code snippets, product IDs, or specialized jargon. Symptoms: low recall on specific query segments, user complaints about obvious misses. Detection: segment queries by type, measure recall per segment. Mitigation: include diverse query types in training data, fall back to sparse retrieval for detected OOD queries, or use hybrid retrieval by default.
Embedding Drift
When you update the encoder model, old document embeddings become incompatible with new query embeddings. Even small model changes shift the entire vector space. Symptoms: recall drops after model update despite better offline metrics. Prevention: always re-encode all documents when updating the encoder. This is expensive but absolutely necessary. Track embedding version with documents; reject queries against mismatched versions or maintain multiple index versions during transitions.
False Semantic Similarity
Embeddings place superficially similar but semantically different texts close together. "How to kill a process" and "how to kill a person" might have high similarity. "Apple iPhone" and "Apple fruit" might cluster together. The model learned surface patterns, not true meaning. Mitigation: include contrastive pairs in training that are lexically similar but semantically different; use domain-specific fine-tuning to separate false positives.