Natural Language Processing SystemsNamed Entity Recognition (NER)Easy⏱️ ~2 min

What is Named Entity Recognition (NER)?

Named Entity Recognition (NER) identifies spans of text that refer to real world entities and assigns them types such as person, organization, location, date, money, or product. Unlike simple keyword matching, NER must determine both the entity type and the exact token boundaries. A span like "New York State Department of Health" contains both a location (New York State) and an organization (Department of Health), requiring precise boundary detection. Production NER systems frame this as sequence labeling over tokens using tagging schemes like BIO (Begin, Inside, Outside) or BILOU (Begin, Inside, Last, Outside, Unit). Each token receives a label that marks whether it starts an entity, continues one, or sits outside any entity. For example, in "Apple CEO Tim Cook visited California", the tokens might be labeled: Apple (B ORG), CEO (O), Tim (B PER), Cook (I PER), visited (O), California (B LOC). Small boundary errors cause major downstream failures. Missing "Inc." from "Microsoft Inc." breaks entity linking to knowledge bases. NER typically operates as one stage in a larger extraction pipeline. After identifying entity spans, downstream systems perform entity normalization (standardizing formats like dates) and entity linking (mapping surface mentions to canonical knowledge base identifiers). This pipeline powers search understanding, content moderation, knowledge graph construction, and privacy redaction. Google uses NER to connect queries to the Knowledge Graph for rich snippets. Amazon extracts product attributes from titles for faceted search. Meta identifies entities in posts to build content graphs for integrity systems. The core challenge is domain adaptation. General purpose models trained on newswire achieve entity level F1 scores around 90 to 93 percent on similar text, but drop by 10 to 30 points on domains like e commerce titles, clinical notes, or social media without fine tuning. A model trained on news may recognize "President Biden" but fail on "iPhone 15 Pro Max" or medical drug names that never appeared in training data.
💡 Key Takeaways
NER identifies entity spans and assigns types (person, organization, location, date, money, product) using sequence labeling schemes like BIO tagging
Boundary detection is critical: missing "Inc." or "Jr." breaks downstream entity linking to knowledge bases
Production systems chain NER with entity normalization and linking to map surface mentions to canonical identifiers in knowledge graphs
Domain shift reduces F1 scores by 10 to 30 points: newswire models fail on e commerce, clinical, or social media text without fine tuning
Real deployments at Google, Amazon, Meta, and Microsoft use NER for search understanding, product attribute extraction, content moderation, and PII redaction
📌 Examples
Google Search: NER extracts entities from queries and documents, linking them to the Knowledge Graph to generate rich snippets and improve result clustering
Amazon product search: NER lifts attributes like brand, size, color from unstructured product titles and descriptions, enabling faceted filters and ad targeting
Meta content moderation: Entity extraction from posts builds content graphs connecting users, topics, and organizations for integrity tooling
← Back to Named Entity Recognition (NER) Overview
What is Named Entity Recognition (NER)? | Named Entity Recognition (NER) - System Overflow