Geospatial & Location Services • Proximity SearchEasy⏱️ ~2 min
What is Proximity Search and Why Two Phase Architecture?
Proximity search finds items that are close according to a distance function and threshold. This splits into two major problem classes: textual proximity (finding documents where query terms appear within a small window of token positions) and geospatial proximity (finding entities like drivers, stores, or listings within a radius or returning k nearest neighbors).
The naive approach of computing distances against all points is prohibitively expensive at scale. When Uber has millions of active drivers or Airbnb has over 7 million listings globally, checking every single candidate would require hundreds of milliseconds to seconds per query. Instead, production systems use a two phase architecture.
Phase one performs coarse filtering using locality preserving structures. For text, this means positional inverted indexes where each posting stores term positions in a document. For geo, this means spatial grids like geohash or H3 hexagons, space filling curves like Z order or Hilbert curves, or tree structures like R trees. This coarse phase prunes 99% or more of candidates in milliseconds by only considering items in nearby grid cells or relevant document position windows.
Phase two applies precise verification and ranking. For text, this evaluates whether position windows satisfy the slop (maximum positional displacement) constraint. For geo, this computes exact great circle distances using Haversine or Vincenty formulas on the pruned candidate set. A Haversine distance calculation takes only 1 to 2 microseconds, so processing 5,000 candidates requires just 5 to 10 milliseconds of CPU time. This two phase approach keeps interactive queries under 100 milliseconds at the 95th percentile.
💡 Key Takeaways
•Textual proximity uses positional inverted indexes to find documents where terms occur within a token window, while geospatial proximity finds physical entities within a radius or returns k nearest neighbors
•Coarse filtering prunes 99%+ of candidates in milliseconds using locality preserving indexes like H3 grids for geo or positional postings for text
•Exact verification computes true distances only on the small candidate set; Haversine distance takes 1 to 2 microseconds so 5,000 candidates cost just 5 to 10 milliseconds CPU
•Production systems like Uber target sub 200 to 300 milliseconds end to end with proximity lookup completing in tens of milliseconds
•Airbnb filters 7 million global listings down to thousands using geohash cells at precision 5 to 8 (ranging from 4.9 km to 38 m coverage) before ranking by price and availability
📌 Examples
Uber dispatch searches a k ring of H3 cells around a rider, typically touching 7 to 37 cells depending on radius and resolution, to build driver candidates before computing exact ETAs
Amazon nearest fulfillment center lookup uses S2 or geohash cells to narrow thousands of global facilities to a few hundred candidates, then applies k nearest neighbor search in tens of milliseconds