Query Rewriting for Improved Recall and Precision
Expansion for Recall
Query expansion adds terms to match more documents. "ML tutorial" expands to "machine learning tutorial OR ML tutorial OR deep learning tutorial." Sources: synonym dictionaries, word embeddings (cosine similarity > 0.8), query logs (queries that led to same clicks). Risk: expansion drift. "Java" expands to "coffee" in a programming context. Constrain expansions using domain signals or entity types.
Spell Correction
5-10% of queries contain typos. Correction pipeline: detect if word is out-of-vocabulary, generate candidates (edit distance ≤ 2), rank by language model probability and query log frequency. "machien lerning" → "machine learning." Confidence thresholds matter: auto-correct high-confidence fixes, show "did you mean?" for uncertain ones. Over-correction frustrates users who typed intentionally unusual terms.
Relaxation for Zero-Result Queries
When a query returns no results, progressively relax constraints. "red Nike running shoes size 12 under $100" relaxes to: remove price filter, then remove size, then remove color. Each relaxation trades precision for recall. Show users why results differ: "No exact matches. Showing similar items." Alternatively, use semantic search to find approximate matches without explicit relaxation.