Learn→ML-Powered Search & Ranking→Query Understanding (Intent, Parsing, Rewriting)→6 of 6

ML-Powered Search & Ranking • Query Understanding (Intent, Parsing, Rewriting)Hard⏱️ ~2 min

Implementation Architecture and Evaluation Strategy

Architecture Overview
Query understanding runs as a preprocessing pipeline before retrieval. Components execute in order: tokenization → spell correction → entity extraction → intent classification → query rewriting. Total latency budget: 10-30ms for interactive search. Each component has fallback behavior: if entity linking fails, proceed with raw tokens; if intent is uncertain, route to default backend. Failures should degrade gracefully, not block search.
Caching Strategies
Query understanding results are highly cacheable. Cache parsed queries with all signals (intent, entities, rewrites). Cache hit rates of 60-80% are typical since popular queries repeat frequently. Use query normalization (lowercase, whitespace collapse) as cache key. TTL depends on knowledge base update frequency: static KB can cache for hours; rapidly updating catalogs need minutes. Invalidate cache when underlying models or KB change.
Evaluation Metrics
Intent accuracy: Measure on held-out labeled set. Target: 90%+ for well-defined intents. Entity precision/recall: Precision (linked entities are correct), recall (all entities found). Target: 85%+ for both. Rewriting quality: Measure indirectly through downstream search metrics. Good rewrites improve click-through rate 5-15% and reduce zero-result rate. End-to-end: A/B test query understanding changes against baseline; measure NDCG, CTR, and abandonment rate.
✅ Deployment Pattern: Shadow mode first. Run new query understanding in parallel without affecting live traffic. Compare outputs against production. Validate metrics match or improve before switching.

💡 Key Takeaways

✓Pipeline order: tokenization → spell correction → entity extraction → intent → rewriting; 10-30ms budget

✓Graceful degradation: if component fails, proceed with partial signals rather than blocking

✓Cache hit rates 60-80% for parsed queries; normalize queries as cache keys

✓Targets: intent 90%+, entity precision/recall 85%+, rewrites improve CTR 5-15%

✓Shadow mode deployment: run in parallel, compare outputs before switching traffic

📌 Interview Tips

1Describe pipeline order with latency budget (10-30ms) for architecture questions

2Mention caching with 60-80% hit rates and normalization as key

3Recommend shadow mode deployment before switching live traffic

← Back to Query Understanding (Intent, Parsing, Rewriting) Overview