Recommendation SystemsContent-Based Filtering & Hybrid ApproachesEasy⏱️ ~2 min

What is Content-Based Filtering?

Definition
Content-based filtering recommends items similar to what a user has liked based on item features. If you watched action movies with high-speed chases, the system recommends other action movies with high-speed chases. It analyzes item attributes, not user behavior patterns.

How It Differs From Collaborative Filtering

Collaborative filtering finds users similar to you and recommends what they liked. It does not need to know anything about item content. Content-based filtering is the opposite: it does not need to know about other users. It only needs to understand item attributes and match them to your expressed preferences.

This distinction matters for cold start. A new item with zero interactions is invisible to collaborative filtering but perfectly visible to content-based systems. If the new item is an action movie, content-based immediately knows to recommend it to action movie fans.

Building A User Profile

The system learns what features you prefer by analyzing items you interacted with. If you rated 20 movies, extract their features: genres, directors, actors, keywords, production years. Aggregate these into a user profile vector. Simple approach: average the feature vectors of items you liked. More sophisticated: weight by rating strength or recency.

This profile becomes your preference fingerprint. To recommend, compare all candidate items to your profile. Items with high similarity get recommended. Cosine similarity is common: user_profile · item_features / (magnitudes).

💡 Key Insight: Content-based filtering solves item cold start but creates filter bubbles. Users only see items similar to what they already liked. No mechanism exposes them to diverse content unless you explicitly inject exploration.
💡 Key Takeaways
Recommends items similar to what you liked based on item features (genre, director, category, keywords), not on other users behavior
Only needs item features and your own history. Works for new systems with few users or new users with no history
New items can be recommended immediately based on content, even with zero interactions. CF cannot recommend items with no data
Explainable: "Recommended because you liked sci-fi movies" is clear. CF can only say "users like you liked this"
Works in sparse domains where few users overlap. If items can be described with features, CBF works
Limitation: can only recommend items similar to what you already liked. Cannot discover cross-category surprises like CF can
📌 Interview Tips
1When asked about CBF vs CF: explain that content-based handles cold start instantly (new items have features) while CF needs interaction history, but CF captures preference patterns CBF misses.
2For feature extraction: mention multi-modal approaches - text embeddings from transformers (768-1024 dims), image embeddings from CNNs (512-2048 dims), combined into unified representations.
3When discussing limitations: explain the serendipity problem - CBF only recommends similar items, creating filter bubbles; hybrid approaches add exploration.
← Back to Content-Based Filtering & Hybrid Approaches Overview