ML-Powered Search & Ranking • Query Understanding (Intent, Parsing, Rewriting)Easy⏱️ ~2 min
What is Query Understanding in Search Systems?
Query understanding converts raw user text into a structured representation that retrieval and ranking systems can act on. When a user types something like "levi black jeans for men under 50," the system must decode intent, extract entities like brand and price, normalize variations, and route to the right index. This happens in milliseconds before any document retrieval begins.
The process includes four core steps. Intent classification determines query type, such as navigational versus informational or product versus help. Parsing identifies entities and attributes like product type, brand, color, size, price range, dates, and locations. Rewriting reformulates the query to resolve ambiguity, normalize brand names, and expand or relax terms to improve recall. Routing chooses the right index or vertical, such as catalog versus help center versus forum posts, and enforces permission or compliance filters.
Practical budgets are extremely tight. Teams often target 5 to 15 milliseconds at the 50th percentile (p50) and under 30 milliseconds at the 95th percentile (p95) on CPU for the entire query understanding stage. The end to end search budget is commonly 150 to 300 milliseconds at p50 and 300 to 600 milliseconds at p95. Google Search and Amazon product search both run query understanding at tens of thousands of requests per second at peak, with autoscaling to hundreds of instances. Peak events like Black Friday can push traffic 3 to 5 times higher, so systems maintain 30 percent or more headroom and use in memory caches for high frequency queries.
💡 Key Takeaways
•Query understanding runs in 5 to 15 milliseconds at p50 and under 30 milliseconds at p95, consuming only 10 to 20 percent of the total search latency budget of 150 to 300 milliseconds at p50.
•Four core steps are intent classification to determine query type, parsing to extract entities like brand and price, rewriting to normalize and expand terms, and routing to select the right index and apply filters.
•Large scale systems at Amazon and Google handle tens of thousands of requests per second at peak with 30 percent or more capacity headroom to absorb traffic spikes during events like holiday sales.
•Cache hit rates for head queries reach 30 to 60 percent with Time To Live (TTL) of 1 to 5 minutes, stabilizing results and absorbing burst traffic while tail queries see near zero cache hits.
•Suggest as you type experiences require even tighter budgets, with sub 50 milliseconds end to end and query understanding staying under 5 to 10 milliseconds per keystroke.
📌 Examples
Amazon product search: user types "sony noise cancelling headphones," system classifies as Electronics, extracts brand Sony and feature noise cancelling, routes to electronics catalog with price and rating facets, completes in 12 milliseconds at p50.
Airbnb location search: user types "cabin lake tahoe 2 bedrooms," system extracts property type cabin, location Lake Tahoe, and filter bedrooms equals 2, normalizes location to canonical identifier, routes to lodging index with date and occupancy filters.