Geospatial & Location Services • Quadtree & Spatial IndexingMedium⏱️ ~3 min
Production Query Pipeline: Radius Search with Hierarchical Covering
The production query pipeline for radius search follows a three stage pattern that balances index efficiency with precision. First, compute a minimal covering of the query disc using hierarchical cells. A region coverer algorithm selects larger cells away from boundaries and finer cells at edges to minimize total cell count. For a 2 km radius query (approximately 12.6 km² area) using Google S2 at level 12 (5.1 km² per cell), you typically cover with 3 to 10 mixed level cells rather than hundreds of fine grained cells.
Second, fetch candidates by scanning cell ID ranges from the spatial index. Since cell IDs are hierarchical and sortable, cells form contiguous ranges in the index, enabling efficient range scans. In urban cores, each cell might contain hundreds to a few thousand points. A well tuned covering with 5 cells at average urban density yields roughly 500 to 2,000 candidate points before refinement. This filter stage typically completes in single digit milliseconds for memory backed indices.
Third, refine candidates by computing precise geodesic distances and filtering to the exact radius. Haversine or Vincenty formulas calculate great circle distances on the sphere. This refinement eliminates false positives from cells that overlap the query circle but contain points outside it. At 2,000 candidates, distance calculations take 1 to 3 milliseconds on modern hardware. The entire pipeline from covering to refined results completes in under 10 milliseconds for typical queries before any downstream ranking or attribute joins.
For k nearest neighbor (kNN) queries, use progressive ring expansion. Start with cells covering a small initial radius, refine candidates, and if fewer than k results remain, expand to neighbor cells iteratively. Each ring adds a bounded number of new cells. At urban densities with k less than or equal to 50, expect 1 to 3 expansion rings with tens of total cells scanned, maintaining sub 10 millisecond latencies.
💡 Key Takeaways
•Region covering computes minimal cell set for query shape, preferring larger cells away from boundaries; 2 km radius typically covered by 3 to 10 mixed level cells in S2
•Cell ID range scans leverage sortable hierarchical keys for contiguous index access, completing in 3 to 5 milliseconds for memory backed indices at urban density
•Refinement with geodesic distance calculations (Haversine or Vincenty) eliminates false positives from overlapping cells, processing 2,000 candidates in 1 to 3 milliseconds
•kNN queries use progressive ring expansion: start small, refine candidates, expand to neighbor cells if needed; k of 50 at urban density requires 1 to 3 rings with tens of cells
•End to end pipeline from covering to refined results completes in under 10 milliseconds before downstream ranking, keeping 99th percentile latencies acceptable for user facing services
📌 Examples
Airbnb geosearch uses hierarchical grid indices with geohash prefixes. Length 6 geohash (approximately 1.2 km cells) for city level searches, length 7 (153 m) for tight radius queries, enabling coarse prefiltering before exact distance checks.
Uber dispatch matching requires sub 100 millisecond end to end latency with millions of moving points. Hierarchical cell prefiltering with H3 or quadtree like systems bounds per request work under skew by scanning only relevant cells.