Learn→Recommendation Systems→Collaborative Filtering (Matrix Factorization)→1 of 6

Recommendation Systems • Collaborative Filtering (Matrix Factorization)Easy⏱️ ~2 min

What is Matrix Factorization for Collaborative Filtering?

Matrix Factorization (MF) compresses a sparse user item interaction matrix into two sets of dense, low dimensional vectors called embeddings. Every user gets one vector and every item gets one vector. To predict how much a user will like an item, you simply compute the dot product between their vectors and add bias terms (global average, user offset, item offset). This transforms a storage problem from O(|U|×|I|) to O((|U|+|I|)×k) where k is the embedding dimension.

The Netflix Prize made this approach famous: 480,189 users and 17,770 movies created a matrix with 100 million ratings but over 8 billion empty cells (99% sparse). Instead of storing or computing all 8 billion values, Matrix Factorization represented each user and movie with vectors of dimension 50 to 200. The dot product between a user vector and movie vector approximated that user's rating, reducing RMSE from 0.9525 to 0.8567.

Production systems use much larger dimensions and scales. Spotify's catalog exceeds 100 million tracks with hundreds of millions of users. Using 64 dimensional embeddings for 100 million items requires approximately 25.6 GB of memory (100M items × 64 dimensions × 4 bytes per float). This fits in memory across a small cluster, enabling millisecond level retrieval.

The key insight is that user preferences live in a much lower dimensional space than the raw number of items suggests. If 100 million songs can be described by combinations of 64 latent factors (like genre, tempo, mood, era), and users have preferences over those same factors, then a 64 dimensional dot product captures most of the signal without storing billions of individual scores.

💡 Key Takeaways

•Reduces memory from billions of cells to millions of embeddings: 100M items at 64 dims uses only 25.6 GB instead of storing all pairwise scores

•Each user and item gets a dense vector of dimension k (typically 20 to 256 in production). Prediction is dot product plus bias terms for global average, user offset, and item offset

•Netflix Prize demonstrated 10% RMSE improvement (0.9525 to 0.8567) on 100 million ratings using 50 to 200 dimensional factors with temporal dynamics

•Scales to Spotify's 100M+ track catalog with sub 10ms retrieval using approximate nearest neighbor (ANN) search over item embeddings

•Works best on warm users and items with many interactions. Cold start (new users or items with zero history) requires fallback strategies like popularity priors or content features

📌 Examples

Netflix Prize: 480,189 users × 17,770 movies with 100,480,507 ratings. Using k=100 dimensions reduced storage from 8.5 billion potential entries to (480K + 17.7K) × 100 = 49.8 million parameters

Spotify production pattern: Store 100M item embeddings (64 dims each) in ANN index. At request time, compute user embedding from recent 50 listens and retrieve top 500 candidates in 5ms for downstream ranking

← Back to Collaborative Filtering (Matrix Factorization) Overview