Entity Parsing and Linking in Query Understanding
The Extraction Pipeline
Step 1: Detect entity spans using NER (Named Entity Recognition). A BERT-based NER model identifies "New York" as a location, "iPhone 15" as a product. Step 2: Generate candidate entities from a knowledge base. "NYC" matches New York City, NYC FC (soccer team), and NYC subway. Step 3: Rank candidates using context. "flights to NYC" disambiguates to the city; "NYC game tonight" suggests the team. Accuracy depends heavily on knowledge base coverage and context modeling.
Knowledge Base Design
The knowledge base stores entities with: canonical name, aliases ("NYC", "New York", "Big Apple"), type (city, person, product), attributes (population, coordinates, price), and relationships (NYC is-in USA). Coverage is critical; queries mentioning entities not in your KB fail silently. Typical sizes: e-commerce might have 10M products; web search needs billions of entities. Update frequency matters: new products, people, events need fast ingestion.
Linking Challenges
Ambiguity: "Apple" has 10+ meanings. Use query context and user history. Partial matches: "iPhone" should match "iPhone 15 Pro Max." Use hierarchical entities. Novel entities: New products or people not yet in KB. Fall back to embedding similarity or treat as unlinked. Entity linking accuracy typically ranges 75-90% depending on domain specificity.