Hierarchical Feature Backoff and Cold Start Handling

Definition
Hierarchical backoff handles entities with sparse data by falling back through progressively broader aggregations: user → segment → global, or item → category → global.
The Cold Start Problem
New users and new items lack historical data. Features like user CTR or item conversion rate are undefined or based on tiny samples with high variance. A user with 3 impressions and 1 click has "33% CTR" that means almost nothing. Over-relying on historical features creates a death spiral: entities with no history get poor scores, never get exposure, never accumulate data, stay poorly scored forever.
User Feature Backoff
If a user has fewer than 50 impressions, don't trust their personal CTR. Fall back to their demographic segment (same age range, region). If the segment also has insufficient data, fall back to global average. Use smooth blending based on sample size: a user with 10 impressions gets 20% weight on personal CTR, 80% on segment CTR. A user with 1,000 impressions gets 95% personal, 5% segment. This is Bayesian shrinkage toward a prior.
Item Feature Backoff
New products with zero sales use category conversion rate. Products with 5 sales blend observed rate with category prior. Also rely heavily on content features that don't require behavioral history: compute item embeddings (numerical vectors capturing meaning) from titles, descriptions, images. New items immediately have embeddings that capture similarity to existing items, allowing the ranker to generalize.
Exploration for Cold Start
Backoff alone isn't enough. Allocate 10-15% of impressions specifically for new or underexplored entities, even if predicted scores are lower. This ensures new items collect initial engagement data within days. The cost is 2-4% immediate engagement, but it prevents the catalog from ossifying around incumbents and enables long-term diversity and supply growth.

💡 Key Takeaways

✓Cold start creates a death spiral: no history → poor scores → no exposure → no data → stays poorly scored

✓User backoff: personal CTR (if 50+ impressions) → segment CTR → global CTR, with Bayesian shrinkage blending

✓Item backoff: item conversion rate → category rate → global rate, plus content-based embeddings for new items

✓Smooth blending based on sample size: 10 impressions = 20% personal / 80% prior; 1000 impressions = 95% personal

✓Exploration (10-15% of impressions for new items) is essential to escape cold start at 2-4% engagement cost

📌 Interview Tips

1Explain the cold start death spiral: no data → poor ranking → no exposure → no data

2Give specific blending weights: 10 impressions = 20% personal, 1000 impressions = 95% personal

3Mention that content embeddings (from titles, images) provide immediate similarity signals for new items

← Back to Feature Engineering for Ranking Overview