Learn→Recommendation Systems→Real-time Personalization (Session-based, Contextual Bandits)→4 of 6

Recommendation Systems • Real-time Personalization (Session-based, Contextual Bandits)Medium⏱️ ~3 min

Session Based Context and Feature Engineering

Session based bandits emphasize short term behavioral signals captured within the current user session rather than only relying on long term user profiles. The hypothesis is that recent interactions (last N clicks, scrolls, dwell times, referrer source) plus environment (time of day, device type, locale) reveal immediate intent and context that drives better action selection for the next decision. This matters because user intent shifts within a session: a user browsing comedy at 8pm on mobile has different preferences than the same user researching documentaries at noon on desktop.

Typical session features include: last 3 to 10 interactions (item IDs, categories, dwell seconds), time since session start, referrer (search, email, direct), device (mobile, tablet, desktop), screen size, network quality, locale, and time of day bucket. Long term profile features (favorite genres, lifetime watch hours, subscriber tenure) are often included but downweighted or used as priors rather than primary signals. Keep feature dimension d between 20 and 500 to maintain single digit millisecond inference latency; precompute action features (item metadata, embeddings) and cache them.

Feature engineering trade offs: real time features (current session state) improve relevance but require streaming infrastructure and increase compute cost by 2 to 3x versus batch features. Netflix and Spotify invest heavily in real time feature stores to capture session dynamics. However, for latency critical surfaces with sub 20ms budgets, teams often use a hybrid: real time session features (cheap to compute, high signal) plus precomputed user and action embeddings (expensive to train, cheap to serve). Cold start users with no history fall back to content based features (item metadata, popularity) or population level priors.

Position bias is a major challenge: items shown at the top of a list get more clicks independent of relevance, biasing reward signals. During exploration, randomize position or use position aware models that learn a position discount factor. For ranking problems, some teams treat the bandit as choosing which ranking policy or layout to apply rather than constructing the full slate directly, simplifying credit assignment and keeping action space manageable.

💡 Key Takeaways

•Session features: last 3 to 10 interactions, dwell times, referrer, device, locale, time of day. Captures immediate intent that shifts within a session (comedy at 8pm mobile versus documentaries at noon desktop).

•Feature dimension d typically 20 to 500 to meet single digit millisecond inference budgets. Precompute and cache action features (item embeddings, metadata) to avoid fanout at request time.

•Real time features improve Click Through Rate (CTR) by 5 percent in production systems but require streaming infrastructure and 2 to 3x compute cost versus batch features. Use hybrid approach for latency critical surfaces.

•Position bias correction: items at top positions get higher reward independent of relevance. Randomize position during exploration or learn position discount factors in the model to avoid biased learning.

•Cold start handling: new users have no history. Fall back to content based features (item metadata, category popularity) or population priors. Use optimistic priors to encourage exploration of new user preferences.

•Action space management: for slate or sequence choices, treat bandit as choosing which ranking policy to apply rather than constructing full slate. Keeps action set at 5 to 20 policies instead of combinatorial explosion.

📌 Examples

Netflix session features: last 5 interactions (title ID, genre, dwell seconds), time in session, device type, time of day bucket, cached user embedding (128d). Total feature vector: 180 dimensions. Policy inference: 8ms p95.

Spotify home page: session features include last 3 tracks (artist, genre, skip/complete), referrer source, device, locale. Precomputed shelf embeddings (64d) for 50 candidate shelves. Bandit chooses top 6 shelves per request in 15ms p95.

Meta notification bandit: session state (last notification time, opens in past hour), user timezone, device, platform version. Real time feature computation adds 3ms to 12ms total latency budget. Lifts open rate 8 percent versus no session context.

← Back to Real-time Personalization (Session-based, Contextual Bandits) Overview