What is Real-Time Incremental Indexing?
WHY BATCH REBUILDS FAIL
A batch rebuild approach means regenerating the complete index whenever content changes. For a 100M vector index, this takes 2-8 hours. If new products are added hourly, batch rebuilds create an impossible backlog. Users searching for a product uploaded 5 minutes ago would find nothing.
The fundamental tension: indexes built for fast search (like IVF or HNSW) are optimized for static data. Their internal structures (cluster centroids, graph edges) assume the data distribution is fixed. Adding vectors incrementally can degrade these structures.
FRESHNESS REQUIREMENTS BY USE CASE
E-commerce listings: New products must be searchable within minutes. A seller uploading inventory expects immediate visibility. Delay = lost sales.
News/content: Breaking news needs to appear in search within seconds to minutes. A 6-hour rebuild cycle makes the system useless for current events.
Social media: Posts should be searchable nearly instantly. Users expect to find content they just saw in their feed.
Catalog updates: Weekly refresh is acceptable for stable catalogs like movie libraries. Here, batch rebuilds work fine.
THE INCREMENTAL CHALLENGE
Vector indexes are not designed for modification. HNSW builds a graph where each node connects to neighbors. Adding a new node requires finding its neighbors, which requires searching the index. If you add thousands of nodes per second, search performance degrades.