A/B Testing & ExperimentationMulti-Armed Bandits (Thompson Sampling, UCB)Hard⏱️ ~2 min

Contextual Bandits: LinUCB and Neural Linear Methods

FROM NON-CONTEXTUAL TO CONTEXTUAL

Non-contextual bandits treat all users the same: arm 3 is either globally good or not. Contextual bandits condition on user and item features. Different users may prefer different arms. This enables generalization: when a new user arrives, use their features to predict which arm they will prefer, even with zero observations for that specific user.

LINUCB: LINEAR CONTEXTUAL BANDITS

LinUCB maintains a linear model per arm. For arm a, the expected reward is θ_a · x where x is the context vector (user features, time of day, device type). It also maintains uncertainty over θ_a via an inverse covariance matrix. The UCB bonus comes from the uncertainty in the prediction for this specific context.

Update: After observing reward r for context x on arm a, update: A_a = A_a + x × x^T and b_a = b_a + r × x. Then θ_a = A_a^{-1} × b_a.

NEURAL LINEAR: DEEP FEATURES WITH LINEAR BANDIT

Training a full neural network online is unstable and expensive. Neural Linear freezes a pretrained deep feature extractor and runs a linear bandit on the embeddings. For example, use a pretrained neural network to encode user profiles and items into 128-dimensional vectors, then run Thompson Sampling or LinUCB on these embeddings. This combines deep representation power with stable online updates.

💡 Key Insight: Contextual bandits solve cold start: new users immediately get personalized recommendations based on features, not requiring historical observations for that specific user.

COMPUTATIONAL COST

With 50-100 features and 10 arms, precompute inverse covariance matrices offline. Scoring is dot product per arm: 50 operations × 10 arms = 500 operations, submillisecond. The feature extraction (running through neural network) can take 2-5ms but is often shared with other systems.

💡 Key Takeaways
Contextual bandits condition on user/item features, enabling personalization and solving user cold start
LinUCB: maintains linear model θ_a per arm with inverse covariance for uncertainty; update is closed-form
Neural Linear: freeze pretrained feature extractor, run linear bandit on embeddings for stable online updates
Scoring with 50-100 features and 10 arms is submillisecond; feature extraction (2-5ms) is the bottleneck
📌 Interview Tips
1When explaining contextual bandits, contrast with non-contextual: different users may prefer different arms based on features
2Describe Neural Linear pattern: deep features frozen, online linear layer, best of both worlds
3Mention cold start advantage: new users get personalized recommendations immediately via feature-based generalization
← Back to Multi-Armed Bandits (Thompson Sampling, UCB) Overview