ML-Powered Search & RankingQuery Understanding (Intent, Parsing, Rewriting)Medium⏱️ ~2 min

Entity Parsing and Linking in Query Understanding

Definition
Entity extraction identifies mentions of real-world things in queries: people, places, products, dates. Entity linking maps these mentions to canonical entries in a knowledge base, resolving ambiguity ("Jordan" → Michael Jordan vs Jordan the country).

The Extraction Pipeline

Step 1: Detect entity spans using NER (Named Entity Recognition). A BERT-based NER model identifies "New York" as a location, "iPhone 15" as a product. Step 2: Generate candidate entities from a knowledge base. "NYC" matches New York City, NYC FC (soccer team), and NYC subway. Step 3: Rank candidates using context. "flights to NYC" disambiguates to the city; "NYC game tonight" suggests the team. Accuracy depends heavily on knowledge base coverage and context modeling.

Knowledge Base Design

The knowledge base stores entities with: canonical name, aliases ("NYC", "New York", "Big Apple"), type (city, person, product), attributes (population, coordinates, price), and relationships (NYC is-in USA). Coverage is critical; queries mentioning entities not in your KB fail silently. Typical sizes: e-commerce might have 10M products; web search needs billions of entities. Update frequency matters: new products, people, events need fast ingestion.

Linking Challenges

Ambiguity: "Apple" has 10+ meanings. Use query context and user history. Partial matches: "iPhone" should match "iPhone 15 Pro Max." Use hierarchical entities. Novel entities: New products or people not yet in KB. Fall back to embedding similarity or treat as unlinked. Entity linking accuracy typically ranges 75-90% depending on domain specificity.

💡 Key Takeaways
Entity extraction identifies mentions; entity linking maps to canonical KB entries
Pipeline: NER detects spans → generate candidates from KB → rank by context
Knowledge base stores: canonical name, aliases, type, attributes, relationships
KB coverage is critical; missing entities fail silently; sizes range 10M (e-commerce) to billions (web)
Entity linking accuracy: 75-90% depending on domain; ambiguity is the main challenge
📌 Interview Tips
1Describe the three-step pipeline (NER → candidates → ranking) for systematic explanation
2Explain knowledge base structure (name, aliases, type, attributes) for practical depth
3Use NYC disambiguation example (city vs soccer team) to illustrate context-based linking
← Back to Query Understanding (Intent, Parsing, Rewriting) Overview