Geospatial & Location Services • Proximity SearchMedium⏱️ ~2 min
Precision vs Candidate Set Size Tradeoff
Choosing the right cell precision is the most critical tuning decision in geospatial proximity systems. Cell precision directly controls the candidate set size, which determines both accuracy and computational cost. Coarser cells reduce index size and write cost but produce larger candidate sets with more false positives. Finer cells improve pruning precision but increase the number of cells you must query and the metadata overhead.
Consider searching a 500 meter radius in a dense city using geohash. At precision 6 (1.2 km cells), the circle overlaps perhaps 4 cells, retrieving 8,000 candidate points that must be distance checked. At precision 7 (150 m cells), you might query 9 cells but retrieve only 2,000 candidates because each cell is smaller and more selective. At precision 8 (38 m cells), you query 16 cells and retrieve just 800 candidates. The sweet spot for Airbnb and similar services is typically precision 6 to 7 for city scale searches, targeting candidate sets in the low thousands.
Adaptive precision based on local density is crucial for consistent performance. Dense downtown Manhattan might have 500 listings per precision 7 cell, while suburban areas have 50. Using fixed precision globally means downtown queries fetch 10x more candidates and blow latency budgets. Production systems measure density per metro area and adjust cell precision accordingly, using finer cells (precision 8) in dense cores and coarser cells (precision 6) in suburbs. This keeps candidate sets within a predictable budget like 500 to 5,000 points regardless of geography.
The other dimension is radius. Searching a 5 km radius at precision 7 requires querying dozens of cells and produces massive candidate sets. Systems enforce maximum radius caps (often 10 to 20 km) and switch strategies for larger areas: instead of proximity search, they fall back to coarser filtering plus pagination or return a fixed k nearest neighbors with an expanding radius until k results are found.
💡 Key Takeaways
•Airbnb targets candidate sets in the low thousands by choosing geohash precision 6 to 7 for city scale searches; precision 6 covers 1.2 km and precision 7 covers 150 m
•Adaptive precision based on density: use finer cells like precision 8 in dense downtown areas and coarser cells like precision 6 in suburbs to keep candidate counts predictable
•Candidate set sizing budget: aim for 500 to 5,000 candidates to keep exact distance filtering under 10 to 20 milliseconds at typical 1 to 2 microseconds per Haversine calculation
•Enforce maximum radius caps of 10 to 20 km; beyond that, switch to k nearest neighbor with expanding radius or coarser grid plus pagination to avoid querying dozens of cells
•Write cost increases with finer precision: moving entities crossing precision 8 cell boundaries trigger more index updates than precision 6, creating write amplification in high velocity workloads
📌 Examples
Uber tunes H3 resolution to city density: resolution 8 or 9 in Manhattan (hex edge ~460 m or ~170 m) but resolution 7 in suburbs (hex edge ~1.2 km) to maintain consistent candidate set sizes during dispatch
Amazon nearest locker search caps radius at 15 km; beyond that, the system falls back to zip code or city level coarse filtering to avoid excessive cell fanout and latency spikes