What Are Two Tower Models and Why Use Them?
The Core Problem
Ranking millions of items per request is too slow. A neural network that scores one user-item pair in 0.1ms would take 100 seconds to score 1 million items. Users expect results in under 100ms. You cannot run a complex model on every item for every request.
The naive solution is to pre-compute scores for all user-item pairs. But with 100 million users and 10 million items, that is 10^15 pairs. Storing them requires petabytes. Updating them when user behavior changes is impossible. You need a smarter architecture.
The Two-Tower Insight
Instead of learning a score directly, learn to place users and items in the same vector space. Users who like similar things cluster together. Items that appeal to similar users cluster together. A user vector close to an item vector means high affinity.
The key insight: user vectors and item vectors are computed independently. The user tower only sees user features. The item tower only sees item features. They never see each other during the forward pass. This independence is what makes the architecture fast.
Why This Makes Retrieval Fast
Item vectors depend only on item features like title, category, and price. These change rarely. Compute all item vectors once, store them in an index. When a new item arrives, compute its vector and add it to the index. This is a batch job that runs hourly or daily.
User vectors depend on user features and recent behavior. Compute these at request time. One user vector takes 1-5ms. Then use approximate nearest neighbor (ANN) search to find the closest item vectors. ANN algorithms like HNSW find top 1000 items from 10 million in 5-10ms. Total retrieval time: under 20ms.