ML-Powered Search & RankingQuery Understanding (Intent, Parsing, Rewriting)Hard⏱️ ~2 min

Implementation Architecture and Evaluation Strategy

Architecture Overview

Query understanding runs as a preprocessing pipeline before retrieval. Components execute in order: tokenization → spell correction → entity extraction → intent classification → query rewriting. Total latency budget: 10-30ms for interactive search. Each component has fallback behavior: if entity linking fails, proceed with raw tokens; if intent is uncertain, route to default backend. Failures should degrade gracefully, not block search.

Caching Strategies

Query understanding results are highly cacheable. Cache parsed queries with all signals (intent, entities, rewrites). Cache hit rates of 60-80% are typical since popular queries repeat frequently. Use query normalization (lowercase, whitespace collapse) as cache key. TTL depends on knowledge base update frequency: static KB can cache for hours; rapidly updating catalogs need minutes. Invalidate cache when underlying models or KB change.

Evaluation Metrics

Intent accuracy: Measure on held-out labeled set. Target: 90%+ for well-defined intents. Entity precision/recall: Precision (linked entities are correct), recall (all entities found). Target: 85%+ for both. Rewriting quality: Measure indirectly through downstream search metrics. Good rewrites improve click-through rate 5-15% and reduce zero-result rate. End-to-end: A/B test query understanding changes against baseline; measure NDCG, CTR, and abandonment rate.

✅ Deployment Pattern: Shadow mode first. Run new query understanding in parallel without affecting live traffic. Compare outputs against production. Validate metrics match or improve before switching.
💡 Key Takeaways
Pipeline order: tokenization → spell correction → entity extraction → intent → rewriting; 10-30ms budget
Graceful degradation: if component fails, proceed with partial signals rather than blocking
Cache hit rates 60-80% for parsed queries; normalize queries as cache keys
Targets: intent 90%+, entity precision/recall 85%+, rewrites improve CTR 5-15%
Shadow mode deployment: run in parallel, compare outputs before switching traffic
📌 Interview Tips
1Describe pipeline order with latency budget (10-30ms) for architecture questions
2Mention caching with 60-80% hit rates and normalization as key
3Recommend shadow mode deployment before switching live traffic
← Back to Query Understanding (Intent, Parsing, Rewriting) Overview
Implementation Architecture and Evaluation Strategy | Query Understanding (Intent, Parsing, Rewriting) - System Overflow