Geospatial & Location Services • GeohashingMedium⏱️ ~3 min
Proximity Query Pattern and Neighbor Expansion
A naive proximity query that scans only the center geohash cell suffers from boundary false negatives: points just across the cell edge are missed even if they are closer than points inside the cell. The standard solution is neighbor expansion: compute the geohash of the query center at the chosen precision, then generate the 8 surrounding neighbor cells and scan all 9 contiguous key ranges. For a 1 kilometer radius query using precision 7 cells (153 meters), this guarantees coverage because the center cell plus neighbors form a 3 by 3 grid spanning roughly 459 meters in each direction, fully containing the 1 kilometer circle.
The query process follows four steps. First, choose precision so that typical cell width or height is less than or equal to the query radius; a rule of thumb is cell diagonal less than or equal to radius. Second, compute the center geohash and its 8 neighbors, handling edge cases at the anti meridian (plus or minus 180 degrees longitude) and poles where neighbor logic wraps or becomes undefined. Third, execute 9 parallel range scans over the prefix ranges, merging results and deduplicating by object ID since objects near cell boundaries may appear in multiple scans. Fourth, post filter all candidates using exact haversine distance calculation to remove false positives outside the true radius.
Overfetch varies with precision and query radius. At good precision matching (cell size roughly equal to radius), expect 10 to 50 percent overfetch because the 9 cell square covers roughly 1.3 to 2 times the area of the inscribed circle. At coarse precision, overfetch explodes: a 1 kilometer radius with 5 character cells (4.9 kilometers) might scan 25 or more cells yielding 10 to 100 times more candidates than necessary. For larger radii like 5 to 10 kilometers, you must enumerate all geohash prefixes within the bounding box of the circle, which can expand to 12 to 32 or more ranges depending on latitude and precision.
💡 Key Takeaways
•Scanning only the center cell misses boundary points: always include 8 neighbors for small radius queries, expanding to all intersecting cells for larger radii to eliminate false negatives
•Precision matching rule: choose precision where cell diagonal is less than or equal to query radius. For 1 kilometer radius use precision 7 (153 meter cells giving 459 meter grid coverage), for 500 meters use precision 8
•Overfetch at good precision is 10 to 50 percent because 9 cell square covers roughly 1.3 to 2 times the circle area. At bad precision (cells much larger than radius), overfetch explodes to 10 to 100 times or more
•Query steps: compute center geohash and 8 neighbors, run 9 parallel range scans, merge and deduplicate by object ID, post filter with exact haversine distance to remove false positives outside radius
•Edge cases require special handling: anti meridian crossings split into multiple longitude ranges, pole proximity breaks neighbor calculations, requiring validation and split query logic
📌 Examples
High throughput pattern: Key value store with 200 million indexed locations. Query for users within 1 kilometer uses precision 7, scans 9 geohash ranges. Each range scan touches 2 to 5 contiguous storage pages (items clustered by geohash), fetching ~800 candidates total in 3 to 8 milliseconds. Post filter to ~400 actual results within radius. Achieves 40,000 queries per second per node with p99 latency under 15 milliseconds when hot set in memory.
Large radius example: 5 kilometer radius query at precision 6 (1.22 kilometer cells) requires scanning approximately 16 to 25 cells arranged in a 4 by 5 or 5 by 5 grid. With 80 items per cell, fetch ~1,280 to 2,000 candidates, post filter to ~600 results. Overfetch roughly 2 to 3 times, acceptable trade off for range scan simplicity.
Anti meridian case: Query center at longitude 179.8 degrees with 2 kilometer radius crosses the date line. Compute neighbors and split into two bounding boxes: one covering 179 to 180 degrees, another covering minus 180 to minus 178 degrees. Execute separate scans for each box, merge results. Without this split, wraparound logic fails and misses half the results.