What is Proximity Search and Why Two Phase Architecture?

Definition
Proximity search finds items near a reference point in some space. In geospatial systems, it finds locations within a radius. In text systems, it finds documents where words appear close together. The core challenge is searching millions of items efficiently.
Why Naive Approach Fails
Computing distance to every item and filtering is O(n). For 100 million locations, even at 1 microsecond per distance calculation, that is 100 seconds per query. Unacceptable for real-time applications. You need to eliminate most candidates without computing distance.
Indexes solve this by organizing data spatially. Instead of checking every point, check only points in relevant regions. The index tells you which regions might contain nearby points. Check only those.
Two Phase Architecture
Phase 1 - Coarse filtering: Use spatial index to find candidate regions. Retrieve all items in those regions. This is fast because indexes are optimized for region lookup. But it over-fetches: regions are rectangles or cells, not circles.
Phase 2 - Fine filtering: For each candidate, compute exact distance. Filter items outside the actual radius. Sort by distance if returning nearest neighbors. Apply business filters (ratings, availability). This phase is O(k) where k is candidate count, not O(n).
The architecture trades accuracy for speed in phase 1, then recovers accuracy in phase 2. Phase 1 reduces millions to thousands. Phase 2 reduces thousands to tens. Together they achieve sub-100ms latency.
💡 Key Insight: Two-phase search is a general pattern: cheap coarse filter followed by expensive fine filter. The coarse filter must have no false negatives (never miss valid results) but can have false positives (include invalid candidates that phase 2 removes).

💡 Key Takeaways

✓Naive distance check is O(n); unacceptable for millions of items

✓Spatial indexes eliminate most candidates without computing distance

✓Phase 1: coarse filter via index, over-fetches but fast

✓Phase 2: exact distance calculation on candidates, filters to final results

✓Coarse filter must have zero false negatives; false positives are acceptable

📌 Interview Tips

1Calculate naive approach cost: 100M points at 1 microsecond each is 100 seconds per query

2Explain two-phase trade-off: phase 1 reduces millions to thousands, phase 2 reduces thousands to tens

3Emphasize the no-false-negative requirement: missing a valid result is worse than checking extra candidates

← Back to Proximity Search Overview