Embeddings & Similarity SearchIndex Management (Building, Updating, Sharding)Hard⏱️ ~2 min

Trade-offs: Freshness, Recall, Latency, and Cost

THE FUNDAMENTAL TENSION

Index management forces you to choose between four competing goals that cannot all be maximized simultaneously. Improving one typically degrades another. Understanding these trade-offs lets you make intentional decisions rather than discovering painful surprises in production.

Freshness vs recall: Faster index updates (better freshness) mean smaller training batches for clustering. Smaller batches produce worse centroids, reducing recall. A batch of 100K vectors produces centroids that miss 8-12% of relevant items; a batch of 10M produces centroids missing only 2-4%.

Recall vs latency: Higher recall requires searching more partitions. Searching 5% of partitions gives ~92% recall at 15ms. Searching 20% gives ~98% recall at 60ms. For most recommendation systems, 92% recall is acceptable. For safety-critical search (medical, legal), you need 98%+.

COST TRADE-OFFS

Memory vs latency: Keeping all indexes in RAM gives sub-10ms latency. Spilling to SSD increases p99 to 50-100ms but cuts memory costs by 70%. Tiered storage (hot data in RAM, cold on SSD) balances this.

Replication vs cost: More replicas improve read throughput and fault tolerance. 3 replicas give good availability but triple storage costs. For non-critical workloads, 2 replicas may suffice. Critical systems need 3+ across availability zones.

Build frequency vs compute cost: Daily full rebuilds ensure optimal index quality but consume significant compute. Weekly rebuilds with daily incremental updates reduce cost 5-8x while maintaining 95%+ of optimal recall.

MONITORING ESSENTIALS

Track p50, p95, p99 latencies with clear SLOs (e.g., p99 < 50ms). Sample recall weekly by comparing index results to brute-force search on test queries. Monitor index freshness (time since newest item indexed). Alert on recall drops >2% or latency exceeding SLO for 5+ minutes.

⚠️ Key Trade-off: You cannot have fresh, high-recall, low-latency, and cheap simultaneously. Pick three. Most production systems optimize for latency + recall + cost, accepting 1-6 hour freshness delays.
💡 Key Takeaways
Monitor: latency (p50/p95/p99), recall (stable at 95%+), freshness (<1 hour for real-time)
Alert thresholds: p99 latency exceeds SLO, recall drops 2%+, freshness exceeds target
Capacity planning: estimate 6 months ahead; scale before hitting 70% CPU, 80% memory
📌 Interview Tips
1Interview Tip: Describe the key metrics—latency, recall, freshness—and how each is measured.
2Interview Tip: Explain scaling triggers—CPU, memory, latency thresholds that indicate need for more capacity.
← Back to Index Management (Building, Updating, Sharding) Overview
Trade-offs: Freshness, Recall, Latency, and Cost | Index Management (Building, Updating, Sharding) - System Overflow