What is Collaborative Filtering?
The Core Intuition
Collaborative filtering exploits a simple observation: people with similar tastes in the past will have similar tastes in the future. If you and I both gave 5 stars to The Shawshank Redemption, Pulp Fiction, and The Dark Knight, we probably have similar taste in movies. So if I loved Inception and you have not seen it, the system predicts you will like it too.
This is fundamentally different from content-based filtering, which would need to know that all these movies share features like "acclaimed directors" or "complex plots". Collaborative filtering does not need to understand why we have similar taste. It just needs to observe that we do.
User-Based vs Item-Based
User-based: Find users similar to you, see what they liked that you have not tried. The algorithm computes similarity between users based on their rating patterns. Similar users become your "neighborhood". Predictions come from what your neighbors rated highly. Problem: user profiles change rapidly as they rate more items, so neighborhoods shift constantly. Recomputing similarities is expensive.
Item-based: Find items similar to ones you liked. If you rated item A highly, find items that other users rated similarly to A. Item similarities are more stable than user similarities because item profiles change only when new ratings come in, while user profiles change with every new rating. This stability makes item-based collaborative filtering more practical for production.
The Sparsity Problem
With millions of users and millions of items, the rating matrix is enormous but almost entirely empty. A typical user rates maybe 0.01% of available items. Most user pairs have zero items rated in common, making similarity calculation meaningless. This sparsity is why simple nearest-neighbor collaborative filtering struggles at scale and why matrix factorization became the dominant approach.