Hot Index Plus Main Index Architecture
THE TWO-INDEX ARCHITECTURE
The most common solution for real-time updates is maintaining two indexes: a small, frequently-updated "hot" index for recent items, and a large, optimized "main" index for historical data. Queries search both and merge results.
Hot index: Small (10K-1M vectors), updated in real-time or near-real-time. Built for fast inserts, not maximum recall. Uses simpler structures (flat index or small HNSW). Accepts slightly lower search quality for insert speed.
Main index: Large (10M-1B vectors), rebuilt periodically (daily or weekly). Optimized for search quality and throughput. Uses full IVF-PQ or HNSW with careful parameter tuning.
QUERY FLOW
When a query arrives, search both indexes in parallel. The hot index returns top-K recent items. The main index returns top-K historical items. An aggregator merges both result sets by distance score and returns the final top-K.
Latency impact: if hot index search takes 5ms and main index takes 20ms, total query latency is ~20ms (parallel execution). The small hot index adds minimal overhead.
MERGE CYCLES
Periodically (every few hours to daily), merge hot index contents into main index. This involves: extracting vectors from hot index, adding them to main index training data, rebuilding main index, and clearing the hot index.
During merge, traffic continues to the old main index while the new one builds. Once ready, atomically swap traffic to the new main index. This ensures zero-downtime updates.
Merge frequency trade-off: more frequent merges keep main index fresher but increase compute cost. Daily merges work for most cases. High-velocity systems (1M+ new vectors/day) may need 4-6 hour cycles.