Embeddings & Similarity SearchEmbedding Generation (BERT, Sentence-BERT, Graph Embeddings)Medium⏱️ ~2 min

Graph Embeddings for Collaborative and Structural Signals

WHAT GRAPH EMBEDDINGS CAPTURE

Graph embeddings encode structural relationships from interaction graphs: users connected to items they clicked, items connected to categories, users connected to each other through shared behaviors. The embedding captures collaborative signals—patterns of who interacts with what—that pure text or image embeddings completely miss.

Example: User A and User B never searched the same keywords, but both clicked the same 50 products. Their text queries have zero overlap, but their graph embeddings are similar because they have demonstrated similar tastes through behavior rather than words.

HOW GRAPH EMBEDDINGS WORK

Random walk methods (Node2Vec, DeepWalk): Sample random paths through the graph, treat each path as a "sentence" of node IDs, apply word2vec-style training. Nodes that appear in similar path contexts get similar embeddings. Fast to train, scales to billions of edges.

Graph Neural Networks (GNN): Each node aggregates features from its neighbors iteratively. After K layers, each node embedding incorporates information from its K-hop neighborhood. More expressive than random walks but more expensive to train—typically used for smaller graphs or when node features matter.

COMBINING WITH CONTENT EMBEDDINGS

Graph embeddings capture who interacts with what. Content embeddings (text, image) capture what items look like. Best recommendation systems combine both: concatenate the vectors, or train a joint model that uses both signal types.

Cold start problem: new items have no graph edges, so graph embedding is meaningless. Solution: fall back to content embeddings until sufficient interactions accumulate. Typical threshold: 10-50 interactions before graph embedding becomes reliable.

✅ Best Practice: Use graph embeddings for behavioral similarity, content embeddings for semantic similarity. Combine both for robust recommendations that work for both new and established items.
💡 Key Takeaways
Graph embeddings capture collaborative signals from user-item interactions
Random walks treat graph paths as sentences for word2vec-style training
Cold start: new items need 10-50 interactions before graph embedding is reliable
📌 Interview Tips
1Interview Tip: Explain why graph embeddings complement text—users with different queries but same clicks have similar graph embeddings.
2Interview Tip: Describe the cold start solution—fall back to content embeddings until sufficient interactions accumulate.
← Back to Embedding Generation (BERT, Sentence-BERT, Graph Embeddings) Overview