Learn→Recommendation Systems→Collaborative Filtering (Matrix Factorization)→2 of 6

Recommendation Systems • Collaborative Filtering (Matrix Factorization)Medium⏱️ ~3 min

How Collaborative Filtering Works

Core Concept
The rating matrix has users as rows and items as columns. Most cells are empty (unrated). Collaborative filtering fills in these empty cells by finding patterns: similar users rate items similarly, and similar items get rated similarly by the same users.
Computing User Similarity
Take two users and look at items they both rated. If both rated 20 items in common, compute how correlated their ratings are. Common measures: Pearson correlation (how much ratings move together), cosine similarity (angle between rating vectors), or Jaccard similarity (overlap in items rated positively).
Pearson correlation is popular because it handles rating bias. If user A rates everything 1 point higher than user B on average, Pearson still detects they have similar tastes because it measures correlation, not absolute agreement. Cosine similarity is faster to compute and works well when ratings are normalized.
Making Predictions
To predict user U rating for item I: (1) Find the K most similar users to U who have rated item I. (2) Take a weighted average of their ratings, weighted by similarity. If the 5 most similar users rated the item 4, 5, 4, 5, 3 and their similarity scores are 0.9, 0.8, 0.85, 0.75, 0.7, the prediction is (4*0.9 + 5*0.8 + 4*0.85 + 5*0.75 + 3*0.7) / (0.9+0.8+0.85+0.75+0.7) = 4.19.
Choosing K matters. Too small (K=5) makes predictions noisy because a single unusual neighbor dominates. Too large (K=100) dilutes signal with weakly similar users. Most systems use K=20-50. Production systems tune K on held-out validation data.
Item-Based Alternative
Instead of finding similar users, find similar items. To predict user U rating for item I: (1) Find items similar to I that U has already rated. (2) Weighted average of those ratings by item similarity. If U rated items J, K, L with ratings 5, 4, 5, and their similarities to I are 0.85, 0.6, 0.9, prediction = (5*0.85 + 4*0.6 + 5*0.9) / (0.85+0.6+0.9) = 4.68.
Item similarities are precomputed offline. This makes prediction fast: just look up precomputed similarities and do weighted averaging. User-based requires computing similarities at request time or maintaining a constantly updating cache. For catalogs with millions of items, item-based is typically more practical.
⚠️ Interview Question: "Why use item-based over user-based?" Answer: Item similarities are more stable (items do not change preferences), can be precomputed offline, and scale better. User preferences evolve with every new rating, making user-based neighborhoods unstable. Item-based was the breakthrough that made collaborative filtering practical at scale.

💡 Key Takeaways

✓Starts with a user-item matrix: rows are users, columns are items, cells are interactions (ratings, clicks, views). Most cells empty: 10M users × 1M items = 10T cells, but only ~0.00001% filled

✓User-based: find similar users by comparing overlapping ratings (cosine similarity or Pearson correlation), then recommend what similar users liked

✓Prediction formula: weighted average of similar users ratings. Bob (sim 0.9) rates 5, Carol (sim 0.6) rates 4 → predict (0.9×5 + 0.6×4)/(0.9+0.6) = 4.6

✓Item-based: find similar items (items rated similarly by same users), recommend items similar to what you liked. Can precompute since item similarities are stable

✓Cosine similarity: treat ratings as vectors, compute angle. Pearson correlation: how ratings move together after adjusting for each users average

✓Item-based preferred in production: item similarities computed once and cached, user similarities must be recomputed as users add ratings

📌 Interview Tips

1When asked about data sources: explain that implicit signals (views, clicks, plays) are 100-1000x more abundant than explicit ratings, making them essential for production systems.

2For confidence weighting: mention the formula c = 1 + α×interactions (α typically 40), which weights repeated behaviors higher without ignoring single interactions.

3When discussing label quality: explain that explicit ratings are cleaner signals but suffer from selection bias (users only rate items they choose to engage with).

← Back to Collaborative Filtering (Matrix Factorization) Overview