Implementation Architecture and Evaluation Strategy
Architecture Overview
Query understanding runs as a preprocessing pipeline before retrieval. Components execute in order: tokenization → spell correction → entity extraction → intent classification → query rewriting. Total latency budget: 10-30ms for interactive search. Each component has fallback behavior: if entity linking fails, proceed with raw tokens; if intent is uncertain, route to default backend. Failures should degrade gracefully, not block search.
Caching Strategies
Query understanding results are highly cacheable. Cache parsed queries with all signals (intent, entities, rewrites). Cache hit rates of 60-80% are typical since popular queries repeat frequently. Use query normalization (lowercase, whitespace collapse) as cache key. TTL depends on knowledge base update frequency: static KB can cache for hours; rapidly updating catalogs need minutes. Invalidate cache when underlying models or KB change.
Evaluation Metrics
Intent accuracy: Measure on held-out labeled set. Target: 90%+ for well-defined intents. Entity precision/recall: Precision (linked entities are correct), recall (all entities found). Target: 85%+ for both. Rewriting quality: Measure indirectly through downstream search metrics. Good rewrites improve click-through rate 5-15% and reduce zero-result rate. End-to-end: A/B test query understanding changes against baseline; measure NDCG, CTR, and abandonment rate.