Embeddings & Similarity SearchReal-time Updates (Incremental Indexing)Medium⏱️ ~3 min

Hot Index Plus Main Index Architecture

THE TWO-INDEX ARCHITECTURE

The most common solution for real-time updates is maintaining two indexes: a small, frequently-updated "hot" index for recent items, and a large, optimized "main" index for historical data. Queries search both and merge results.

Hot index: Small (10K-1M vectors), updated in real-time or near-real-time. Built for fast inserts, not maximum recall. Uses simpler structures (flat index or small HNSW). Accepts slightly lower search quality for insert speed.

Main index: Large (10M-1B vectors), rebuilt periodically (daily or weekly). Optimized for search quality and throughput. Uses full IVF-PQ or HNSW with careful parameter tuning.

QUERY FLOW

When a query arrives, search both indexes in parallel. The hot index returns top-K recent items. The main index returns top-K historical items. An aggregator merges both result sets by distance score and returns the final top-K.

Latency impact: if hot index search takes 5ms and main index takes 20ms, total query latency is ~20ms (parallel execution). The small hot index adds minimal overhead.

MERGE CYCLES

Periodically (every few hours to daily), merge hot index contents into main index. This involves: extracting vectors from hot index, adding them to main index training data, rebuilding main index, and clearing the hot index.

During merge, traffic continues to the old main index while the new one builds. Once ready, atomically swap traffic to the new main index. This ensures zero-downtime updates.

Merge frequency trade-off: more frequent merges keep main index fresher but increase compute cost. Daily merges work for most cases. High-velocity systems (1M+ new vectors/day) may need 4-6 hour cycles.

⚠️ Key Trade-off: Hot index size matters. Too small: frequent merges needed. Too large: hot index search degrades. Sweet spot is usually 1-5% of main index size.
💡 Key Takeaways
Hot index (10K-1M vectors): fast inserts, updated real-time; Main index (10M-1B): optimized for search, rebuilt periodically
Query flow: search both indexes in parallel, merge results by distance score, return global top-K
Merge frequency trade-off: more frequent = fresher main index but higher compute cost
📌 Interview Tips
1Interview Tip: Draw the two-index architecture with query flow—show parallel search and result merging.
2Interview Tip: Explain the merge process including atomic swaps for zero-downtime updates.
← Back to Real-time Updates (Incremental Indexing) Overview