Learn→Embeddings & Similarity Search→Real-time Updates (Incremental Indexing)→6 of 6

Embeddings & Similarity Search • Real-time Updates (Incremental Indexing)Medium⏱️ ~2 min

Operational Metrics and Failure Detection

FRESHNESS METRICS
Time to searchability: How long from item creation until it appears in search results. For hot+main architecture, this is hot index ingestion time (typically seconds to minutes). Track p50, p95, p99. Alert if p99 exceeds SLO (e.g., >5 minutes for e-commerce).
Index age: Age of the newest item in main index. Reflects how stale the main index is. If merge cycle is daily, main index age oscillates between 0-24 hours. Track to ensure merges complete on schedule.
Hot index size: Number of vectors in hot index. Should stay within designed capacity. If hot index grows beyond threshold, it signals merge failures or capacity issues.
QUALITY METRICS
Recall@K: Weekly sampling of queries, comparing index results to brute-force exact search. Establishes baseline (e.g., 95% recall@100). Alert on 2%+ drop—indicates drift or index corruption.
Latency distribution: p50, p95, p99 query latency. Sudden p99 spikes often indicate hot shard issues or resource contention. Gradual increases suggest index bloat or degraded structures.
FAILURE DETECTION
Merge failures: Monitor merge job completion. Failures leave hot index growing unbounded. Set alerts for merge duration exceeding 2x normal time or consecutive failures.
Ingestion backlog: Queue depth for vectors waiting to enter hot index. Growing backlog indicates ingestion cannot keep up with arrival rate. Scale ingestion workers or reduce rate.
Cross-index inconsistency: Periodically sample items, verify they appear in exactly one index (hot or main, not both, not neither). Inconsistencies indicate merge bugs or delete propagation issues.
✅ Best Practice: Build a dashboard showing: time to searchability (freshness), recall@K trend (quality), hot index size (capacity), and merge job status. On-call should diagnose issues within 5 minutes using this dashboard.

💡 Key Takeaways

✓Freshness metrics: time to searchability (p99 < SLO), index age, hot index size (capacity alert)

✓Quality metrics: recall@K sampled weekly (alert on 2%+ drop), latency distribution (p50/p95/p99)

✓Failure detection: merge job status, ingestion backlog, cross-index consistency sampling

📌 Interview Tips

1Interview Tip: Describe a real-time index monitoring dashboard with freshness, quality, and capacity metrics.

2Interview Tip: Explain how to detect merge failures—monitor job completion and hot index size growth.

← Back to Real-time Updates (Incremental Indexing) Overview