Learn→ML-Powered Search & Ranking→Query Understanding (Intent, Parsing, Rewriting)→4 of 6

ML-Powered Search & Ranking • Query Understanding (Intent, Parsing, Rewriting)Medium⏱️ ~3 min

Query Rewriting for Improved Recall and Precision

Definition
Query rewriting transforms the original query into one or more alternative queries that retrieve better results. Types include: expansion (add synonyms), correction (fix typos), relaxation (remove constraints), and reformulation (rephrase entirely).
Expansion for Recall
Query expansion adds terms to match more documents. "ML tutorial" expands to "machine learning tutorial OR ML tutorial OR deep learning tutorial." Sources: synonym dictionaries, word embeddings (cosine similarity > 0.8), query logs (queries that led to same clicks). Risk: expansion drift. "Java" expands to "coffee" in a programming context. Constrain expansions using domain signals or entity types.
Spell Correction
5-10% of queries contain typos. Correction pipeline: detect if word is out-of-vocabulary, generate candidates (edit distance ≤ 2), rank by language model probability and query log frequency. "machien lerning" → "machine learning." Confidence thresholds matter: auto-correct high-confidence fixes, show "did you mean?" for uncertain ones. Over-correction frustrates users who typed intentionally unusual terms.
Relaxation for Zero-Result Queries
When a query returns no results, progressively relax constraints. "red Nike running shoes size 12 under $100" relaxes to: remove price filter, then remove size, then remove color. Each relaxation trades precision for recall. Show users why results differ: "No exact matches. Showing similar items." Alternatively, use semantic search to find approximate matches without explicit relaxation.
💡 Production Pattern: Run original query and rewritten queries in parallel. Merge results, de-duplicate, re-rank. This captures both exact matches and expanded matches without latency penalty.

💡 Key Takeaways

✓Rewriting types: expansion (synonyms), correction (typos), relaxation (remove constraints), reformulation

✓Expansion sources: synonym dictionaries, embeddings (cosine > 0.8), query logs (same-click queries)

✓5-10% of queries have typos; use edit distance ≤ 2 candidates ranked by language model

✓Relaxation trades precision for recall; show users why results differ from original query

✓Run original and rewritten queries in parallel, merge and re-rank to avoid latency penalty

📌 Interview Tips

1List the four rewriting types (expansion, correction, relaxation, reformulation) as framework

2Describe spell correction pipeline (OOV detection → candidates → ranking) with confidence thresholds

3Mention parallel query execution pattern for production systems

← Back to Query Understanding (Intent, Parsing, Rewriting) Overview