Embeddings & Similarity SearchReal-time Updates (Incremental Indexing)Medium⏱️ ~2 min

Operational Metrics and Failure Detection

FRESHNESS METRICS

Time to searchability: How long from item creation until it appears in search results. For hot+main architecture, this is hot index ingestion time (typically seconds to minutes). Track p50, p95, p99. Alert if p99 exceeds SLO (e.g., >5 minutes for e-commerce).

Index age: Age of the newest item in main index. Reflects how stale the main index is. If merge cycle is daily, main index age oscillates between 0-24 hours. Track to ensure merges complete on schedule.

Hot index size: Number of vectors in hot index. Should stay within designed capacity. If hot index grows beyond threshold, it signals merge failures or capacity issues.

QUALITY METRICS

Recall@K: Weekly sampling of queries, comparing index results to brute-force exact search. Establishes baseline (e.g., 95% recall@100). Alert on 2%+ drop—indicates drift or index corruption.

Latency distribution: p50, p95, p99 query latency. Sudden p99 spikes often indicate hot shard issues or resource contention. Gradual increases suggest index bloat or degraded structures.

FAILURE DETECTION

Merge failures: Monitor merge job completion. Failures leave hot index growing unbounded. Set alerts for merge duration exceeding 2x normal time or consecutive failures.

Ingestion backlog: Queue depth for vectors waiting to enter hot index. Growing backlog indicates ingestion cannot keep up with arrival rate. Scale ingestion workers or reduce rate.

Cross-index inconsistency: Periodically sample items, verify they appear in exactly one index (hot or main, not both, not neither). Inconsistencies indicate merge bugs or delete propagation issues.

✅ Best Practice: Build a dashboard showing: time to searchability (freshness), recall@K trend (quality), hot index size (capacity), and merge job status. On-call should diagnose issues within 5 minutes using this dashboard.
💡 Key Takeaways
Freshness metrics: time to searchability (p99 < SLO), index age, hot index size (capacity alert)
Quality metrics: recall@K sampled weekly (alert on 2%+ drop), latency distribution (p50/p95/p99)
Failure detection: merge job status, ingestion backlog, cross-index consistency sampling
📌 Interview Tips
1Interview Tip: Describe a real-time index monitoring dashboard with freshness, quality, and capacity metrics.
2Interview Tip: Explain how to detect merge failures—monitor job completion and hot index size growth.
← Back to Real-time Updates (Incremental Indexing) Overview