Learn→Embeddings & Similarity Search→Embedding Generation (BERT, Sentence-BERT, Graph Embeddings)→1 of 6
Embeddings & Similarity Search • Embedding Generation (BERT, Sentence-BERT, Graph Embeddings)Easy⏱️ ~2 min
What is Embedding Generation and Why It Matters
Embedding generation transforms text or graph entities into dense numerical vectors that capture semantic meaning and structural relationships. Instead of treating words as discrete symbols, embeddings map them into continuous vector spaces where similar concepts cluster together. A sentence about dogs and a sentence about puppies will have vectors pointing in similar directions, measurable through cosine similarity.
This representation unlocks efficient semantic search and recommendation systems. At Google and Pinterest scale, you cannot compare every query against millions of documents using slow pairwise text comparisons. Instead, systems precompute embeddings for all documents once, store them in specialized vector indices, and compute only the query embedding at request time. A single dot product operation then scores similarity in microseconds.
Production systems use embeddings at massive scale with concrete performance requirements. Pinterest serves recommendations over billions of nodes with 128 dimensional embeddings, returning candidates in under 100 milliseconds. Spotify matches tens of millions of tracks to user preferences using collaborative embeddings for real time playlist generation. The efficiency gain is dramatic: comparing raw text might take 50 to 200 milliseconds per pair, while vector dot products complete in single digit microseconds, enabling 10,000x throughput improvements.
Three major embedding types serve different needs. Bidirectional Encoder Representations from Transformers (BERT) produces contextual token embeddings where each word vector depends on surrounding context. Sentence BERT (SBERT) optimizes entire sentence representations for semantic similarity tasks. Graph embeddings capture network structure and collaborative signals for recommendation systems where relationships matter as much as content.
💡 Key Takeaways
•Embeddings map text or entities into dense vectors where semantic similarity translates to geometric proximity measured by cosine similarity or dot product
•BERT base outputs 768 dimensional vectors per token with bidirectional context, while SBERT produces 384 or 768 dimensional sentence vectors optimized for semantic matching
•Graph embeddings use 64 to 256 dimensions to represent nodes and preserve network structure, essential for collaborative filtering in recommendations
•Production systems achieve sub 20 millisecond retrieval over 100 million items by precomputing document embeddings and searching vector indices instead of comparing raw text
•At 100 million documents with 768 dimensions in float32, raw storage requires roughly 307 GB, driving quantization to float16 or product quantization down to 5 to 20 bytes per vector
•Vector indices enable 10,000x speedup versus text comparison: dot products complete in microseconds while pairwise text scoring takes 50 to 200 milliseconds
📌 Examples
Pinterest uses 128 dimensional graph embeddings over billions of nodes, serving recommendation candidates in under 100 milliseconds tail latency
Spotify generates track and user embeddings for tens of millions of items, powering personalized Home and Discover feeds with real time retrieval
Google Search uses two tower retrieval where query and document towers produce embeddings independently, enabling precomputation on billions of web pages