Natural Language Processing SystemsSemantic Search (Dense Embeddings, ANN)Hard⏱️ ~4 min

Failure Modes and Edge Cases in Production Semantic Search

Embedding Drift

When you update your embedding model, all existing vectors become incompatible. The new model produces vectors in a different semantic space - even identical text gets different coordinates. Documents embedded with model v1 will not match queries embedded with v2. The vectors speak different languages.

The solution is re-embedding your entire corpus when changing models. For millions of documents, this takes hours to days. Plan model updates carefully: verify quality improvements justify the re-embedding cost.

Query-Document Length Mismatch

Short queries and long documents may not align well. A 5-word query captures limited context; a 2000-word document's embedding averages over many concepts. Models trained specifically for retrieval (e5, bge) use asymmetric training that optimizes for short query to long document matching.

⚠️ Common Failure: Generic sentence embedding models trained on similar-length text pairs underperform on retrieval tasks. Use models designed for asymmetric retrieval.

Out-of-Domain Queries

Models trained on general text may fail on specialized domains. Medical terminology or legal jargon might not embed correctly. Test your model on representative domain queries before deployment. If generic models fail, fine-tune on domain data.

False Confidence

Semantic search always returns results, even for nonsense queries. There is no "no results found." A query about "quantum banana teleportation" returns the closest documents, even if none are relevant. Set minimum similarity thresholds. Monitor if users frequently click result 5+ (indicating top results were not helpful).

💡 Key Takeaways
Model updates make all existing vectors incompatible - re-embedding millions of docs takes hours to days. Plan updates carefully.
Short queries and long documents may not align well - use retrieval-specific models (e5, bge) designed for asymmetric matching
Generic models fail on specialized domains (medical, legal) - test on representative queries before deployment
Semantic search always returns results, even for nonsense queries - set minimum similarity thresholds and monitor click-through patterns
📌 Interview Tips
1Explain embedding drift: changing models is like changing languages. Documents in French (v1) do not match queries in Spanish (v2).
2Recommend retrieval-specific models: e5, bge, and similar are trained for short-query-to-long-document asymmetry.
3Describe the false confidence problem: 'quantum banana teleportation' always returns something. Monitor if users click beyond result 3.
← Back to Semantic Search (Dense Embeddings, ANN) Overview