ML-Powered Search & RankingQuery Understanding (Intent, Parsing, Rewriting)Medium⏱️ ~3 min

Query Rewriting for Improved Recall and Precision

Definition
Query rewriting transforms the original query into one or more alternative queries that retrieve better results. Types include: expansion (add synonyms), correction (fix typos), relaxation (remove constraints), and reformulation (rephrase entirely).

Expansion for Recall

Query expansion adds terms to match more documents. "ML tutorial" expands to "machine learning tutorial OR ML tutorial OR deep learning tutorial." Sources: synonym dictionaries, word embeddings (cosine similarity > 0.8), query logs (queries that led to same clicks). Risk: expansion drift. "Java" expands to "coffee" in a programming context. Constrain expansions using domain signals or entity types.

Spell Correction

5-10% of queries contain typos. Correction pipeline: detect if word is out-of-vocabulary, generate candidates (edit distance ≤ 2), rank by language model probability and query log frequency. "machien lerning" → "machine learning." Confidence thresholds matter: auto-correct high-confidence fixes, show "did you mean?" for uncertain ones. Over-correction frustrates users who typed intentionally unusual terms.

Relaxation for Zero-Result Queries

When a query returns no results, progressively relax constraints. "red Nike running shoes size 12 under $100" relaxes to: remove price filter, then remove size, then remove color. Each relaxation trades precision for recall. Show users why results differ: "No exact matches. Showing similar items." Alternatively, use semantic search to find approximate matches without explicit relaxation.

💡 Production Pattern: Run original query and rewritten queries in parallel. Merge results, de-duplicate, re-rank. This captures both exact matches and expanded matches without latency penalty.
💡 Key Takeaways
Rewriting types: expansion (synonyms), correction (typos), relaxation (remove constraints), reformulation
Expansion sources: synonym dictionaries, embeddings (cosine > 0.8), query logs (same-click queries)
5-10% of queries have typos; use edit distance ≤ 2 candidates ranked by language model
Relaxation trades precision for recall; show users why results differ from original query
Run original and rewritten queries in parallel, merge and re-rank to avoid latency penalty
📌 Interview Tips
1List the four rewriting types (expansion, correction, relaxation, reformulation) as framework
2Describe spell correction pipeline (OOV detection → candidates → ranking) with confidence thresholds
3Mention parallel query execution pattern for production systems
← Back to Query Understanding (Intent, Parsing, Rewriting) Overview