Recommendation SystemsCold Start ProblemMedium⏱️ ~3 min

Progressive Profiling and Identity Resolution for User Cold Start

Core Concept
Progressive profiling collects user preferences through explicit signals during onboarding and implicit signals during usage. The goal: accelerate the transition from cold to warm as fast as possible.

Explicit Onboarding

Ask users their preferences directly. "What genres do you like?" or "Select 5 items you have enjoyed." This gives immediate signal. Danger: onboarding friction causes drop-off. Keep it under 30 seconds. 3-5 selections is the sweet spot. More reduces completion rates without proportional gain.

Implicit Signal Extraction

Every interaction is signal. Clicks, scroll depth, time on page, search queries. Weight these by strength: purchase beats click, long dwell beats short dwell. Build user profile as weighted combination of item features from interacted items.

Identity Resolution

Users often visit before signing up. If you can link pre-signup browsing to post-signup account, the user is no longer cold. Use device fingerprinting, cookies, or IP address (with privacy considerations) to connect sessions. This can reduce effective cold start rate by 30-50%.

⚠️ Interview Question: "How would you handle new user cold start?" Walk through: (1) identity resolution to check if user has prior anonymous sessions, (2) lightweight onboarding to collect explicit preferences, (3) segment-based defaults, (4) rapid profile building from early interactions. Show you think about the full funnel.
💡 Key Takeaways
Progressive profiling uses lightweight onboarding interactions (selecting 3 to 5 artists, liking 5 to 10 titles) to generate initial embeddings, reducing time to first good recommendation from days to under 60 seconds
Onboarding friction trade-off: asking for 10 preferences can drop signup completion by 10 to 20%, but dramatically improves early session quality; optimal designs use one tap choices and adaptive questioning
Identity resolution links devices, browsers, and sessions into unified user profiles using deterministic keys (login, email) and privacy safe probabilistic signals (device fingerprints, behavioral consistency)
Session based models personalize in real time using only the last 2 to 3 interactions, critical for anonymous users, typically implemented with RNNs or transformers over last 10 to 20 actions with sub 50ms inference
Unified identity graphs enable deduplicated exposures across devices (don't show same item twice), correct attribution for conversions, and recency weighted feature aggregation for personalization
📌 Interview Tips
1When asked about onboarding: explain preference collection (select 5-10 interests/artists/genres) that seeds initial embeddings, reducing cold start period from days to minutes.
2For cross-device identity: mention probabilistic identity graphs linking sessions via login, device fingerprints, email; unified profiles accelerate personalization.
3When discussing ROI: explain that good onboarding can improve new user retention by 10-20% by delivering relevant content in first session instead of generic recommendations.
← Back to Cold Start Problem Overview