Embeddings & Similarity SearchReal-time Updates (Incremental Indexing)Hard⏱️ ~2 min

Index Drift and Consistency Guarantees

Index drift occurs when the index diverges from the source of truth due to out of order event processing, dropped deletes, or dual write inconsistencies. If a delete event arrives before an earlier update, the item reappears. If two updates arrive out of order, the index applies a stale version. In distributed index tiers with eventual consistency, different replicas can temporarily return different results. Production systems must prevent these anomalies while avoiding expensive synchronous coordination. The solution is per entity versioning with last write wins semantics. Every change event carries a monotonically increasing version number. The indexer checks the current version before applying an update. If the incoming version is older, reject the write. Store a checkpoint per shard that records the last applied log offset, and use idempotent upserts so replaying events produces the same state. Google's Caffeine and Meta's social graph search use these techniques to provide freshness under 2 seconds with strong eventual consistency. Strict read after write consistency is costly in distributed indexes. It requires synchronous acknowledgment from all replicas before returning success, adding 10 to 50 milliseconds per write. Most systems accept eventual consistency with bounded staleness, typically p95 under 2 to 5 seconds. Clients can request a minimum version if needed, forcing the index to wait until that version is visible, but this increases tail latency. For critical writes like content moderation or privacy changes, some systems use a primary region with synchronous replication and pin queries to that region. Write storms amplify consistency issues. If a trending entity receives bursts of updates, the indexer can fall behind and apply events with significant lag. Use rate limiting per key and back pressure signals to slow upstream producers. Admission control for background compaction prevents maintenance tasks from starving write ingestion. Monitor indexing lag in seconds, queue depth, and per entity version gaps to detect drift early.
💡 Key Takeaways
Per entity versioning with monotonically increasing numbers prevents out of order updates, indexer rejects writes with stale versions using last write wins
Checkpoint per shard records last applied log offset, enabling exactly once semantics with idempotent replays after crashes or rebalances
Strict read after write adds 10 to 50 milliseconds per write with synchronous replication, most systems use eventual consistency with p95 staleness under 2 to 5 seconds
Write storms on trending entities cause lag and version gaps, use per key rate limiting and admission control for compaction to prevent starvation
Tombstone resurrection happens when delete events are dropped or arrive very late, always embed version in deletes and enforce last write wins at apply time
Multi region deployments can have 10 to 60 second replication lag, critical workflows like moderation pin to primary region with synchronous writes
📌 Examples
Index drift scenario: update version 5 arrives, then delete version 4, delete is rejected, item remains correctly with version 5
Write storm: viral post receives 10,000 updates per second, rate limiter caps to 500 per second per entity, queue depth spikes but lag stays under 3 seconds
Privacy change at Meta: user blocks another user, delete propagates to all index replicas within 2 seconds using prioritized events and per entity versions
← Back to Real-time Updates (Incremental Indexing) Overview
Index Drift and Consistency Guarantees | Real-time Updates (Incremental Indexing) - System Overflow