Core Concept
Progressive profiling collects user preferences through explicit signals during onboarding and implicit signals during usage. The goal: accelerate the transition from cold to warm as fast as possible.
Explicit Onboarding
Ask users their preferences directly. "What genres do you like?" or "Select 5 items you have enjoyed." This gives immediate signal. Danger: onboarding friction causes drop-off. Keep it under 30 seconds. 3-5 selections is the sweet spot. More reduces completion rates without proportional gain.
Implicit Signal Extraction
Every interaction is signal. Clicks, scroll depth, time on page, search queries. Weight these by strength: purchase beats click, long dwell beats short dwell. Build user profile as weighted combination of item features from interacted items.
Identity Resolution
Users often visit before signing up. If you can link pre-signup browsing to post-signup account, the user is no longer cold. Use device fingerprinting, cookies, or IP address (with privacy considerations) to connect sessions. This can reduce effective cold start rate by 30-50%.
⚠️ Interview Question: "How would you handle new user cold start?" Walk through: (1) identity resolution to check if user has prior anonymous sessions, (2) lightweight onboarding to collect explicit preferences, (3) segment-based defaults, (4) rapid profile building from early interactions. Show you think about the full funnel.
✓Progressive profiling uses lightweight onboarding interactions (selecting 3 to 5 artists, liking 5 to 10 titles) to generate initial embeddings, reducing time to first good recommendation from days to under 60 seconds
✓Onboarding friction trade-off: asking for 10 preferences can drop signup completion by 10 to 20%, but dramatically improves early session quality; optimal designs use one tap choices and adaptive questioning
✓Identity resolution links devices, browsers, and sessions into unified user profiles using deterministic keys (login, email) and privacy safe probabilistic signals (device fingerprints, behavioral consistency)
✓Session based models personalize in real time using only the last 2 to 3 interactions, critical for anonymous users, typically implemented with RNNs or transformers over last 10 to 20 actions with sub 50ms inference
✓Unified identity graphs enable deduplicated exposures across devices (don't show same item twice), correct attribution for conversions, and recency weighted feature aggregation for personalization