Embeddings & Similarity SearchReal-time Updates (Incremental Indexing)Medium⏱️ ~2 min

What is Real-Time Incremental Indexing?

Real time incremental indexing means updating only the changed data in your search or retrieval system and making those updates queryable within seconds rather than hours. Instead of rebuilding the entire index from scratch, you apply targeted inserts, updates, and deletes as they happen. This applies to both traditional inverted indexes (for text search and filtering) and vector indexes (for semantic similarity search with embeddings). Production systems typically target freshness under 2 to 5 seconds with query p95 latencies below 100 milliseconds. The core architecture treats the index as a materialized view driven by an append only changelog. Change Data Capture (CDC) from databases or domain events from services carry mutations with monotonically increasing sequence numbers. The indexer consumes these events and performs idempotent upserts and tombstone deletes. This ensures each change is applied exactly once even with retries or crashes. A critical separation of concerns keeps this performant. The serving index optimizes for fast reads with in memory segments or graph structures. Background processes handle compaction, graph maintenance, and segment merges to prevent unbounded write amplification. Google's Caffeine system for web search reduced freshness from hours to seconds using this approach, while Meta's social graph search reflects new edges and privacy changes within seconds at query latencies under 100 milliseconds. Production systems typically handle 10,000 to 100,000 index upserts per second per cluster. Per shard ingestion often runs at 1,000 to 5,000 writes per second with p95 write latency under 50 milliseconds. The key trade off is complexity versus freshness. Batch rebuilding is simpler but means stale data for minutes to hours, while incremental indexing requires sophisticated streaming pipelines and compaction logic.
💡 Key Takeaways
Freshness targets of 1 to 5 seconds for most production systems, compared to minutes or hours with batch rebuilds
Uses Change Data Capture or event streams with sequence numbers for ordering and idempotent upserts to handle retries
Serving index optimized for reads while background processes handle compaction to prevent write amplification
Throughput typically 10,000 to 100,000 upserts per second per cluster with per shard rates of 1,000 to 5,000 per second
Trade off is operational complexity versus freshness, batch systems are simpler but stale for longer periods
Google Caffeine brought web search freshness from hours to seconds, Meta social graph reflects changes in under 2 seconds
📌 Examples
Google Caffeine system updates web search index with document level changes, achieving second level freshness instead of hourly batch updates
Meta social graph search processes new friendship edges and privacy updates within 2 seconds while maintaining sub 100ms query latency
Pinterest streaming feature pipeline incorporates engagement signals within seconds using CDC logs feeding materialized views
← Back to Real-time Updates (Incremental Indexing) Overview