Geospatial & Location Services • Proximity SearchMedium⏱️ ~2 min
Geospatial Indexing: Grids, Curves, and Trees
Geospatial proximity systems rely on indexing structures that map two dimensional latitude and longitude coordinates to keys that preserve locality. When nearby points map to nearby keys, a simple range scan can retrieve candidates without scanning the entire dataset. Three families dominate production systems.
Space filling curves like geohash, S2, and Z order map 2D coordinates to 1D integer or string keys by interleaving bits from latitude and longitude. A geohash is a base32 encoded string where each character adds precision: length 5 covers roughly 4.9 km squares, length 6 covers 1.2 km, length 7 covers 150 m, and length 8 covers 38 m. Prefixes share location, so points with common prefixes are geographically close. To search a circle of radius R, you compute the minimal set of cells that cover the circle (typically 1 to 9 adjacent cells), then fetch all points with those prefixes. This works beautifully with key value stores and LSM trees because range scans are native operations. The tradeoff is shape distortion: grid cells are squares or rectangles, not circles, so you must apply exact distance filtering afterward.
Hierarchical grids like H3 (hexagons) and quadtrees divide the world into nested cells with parent child relationships. H3 is particularly popular for ride hailing because hexagons tile more uniformly than squares, reducing edge distortion, and constant time operations for finding neighbors (k ring) bound query fanout. Uber uses H3 resolutions tuned to city density, expanding 1 to 2 rings around a rider's cell to capture nearby drivers. Updates are extremely cheap: when a driver moves, you recompute their H3 cell ID and update a single index entry. This makes hierarchical grids ideal for write heavy workloads with millions of position updates per second.
Tree structures like R trees and k d trees partition space with bounding boxes or hyperplanes and support efficient rectangle and circle queries. R trees shine for static or slowly changing point of interest catalogs where you can bulk load and rarely rebuild. However, frequent inserts and deletes cause tree rebalancing overhead, and hotspots (dense downtown areas) create deep subtrees that hurt query performance. Production systems favor grids and curves for dynamic entities and reserve trees for static datasets.
💡 Key Takeaways
•Geohash precision: length 5 covers 4.9 km, length 6 covers 1.2 km, length 7 covers 150 m, and length 8 covers 38 m; Airbnb uses these precisions to filter millions of listings down to thousands
•Space filling curves map 2D to 1D keys enabling range scans in key value stores and LSM trees; ideal for write heavy workloads and horizontal sharding
•H3 hierarchical hexagons provide O(1) neighbor lookups and uniform area coverage; Uber uses k ring expansion to bound query fanout even during peak QPS
•R trees work well for static point of interest datasets but suffer from rebalancing overhead and hotspot sensitivity with frequent inserts or deletes
•Covering a circle with grid cells includes corner regions outside the true circle, requiring exact Haversine distance filtering to eliminate false positives
📌 Examples
Uber H3 deployment: riders and drivers are binned into H3 cells at city density appropriate resolutions; dispatch expands 1 to 2 rings (7 to 37 cells) around the rider cell to fetch driver candidates
Amazon and Airbnb use geohash or S2 cell IDs as primary shard keys, enabling locality based routing where queries only hit shards overlapping the search area instead of broadcasting to all nodes