Geospatial & Location Services • GeohashingMedium⏱️ ~3 min
Trade Offs: Geohash vs H3 vs S2 in Production
Geohashing offers simplicity and speed for point based proximity queries in ordered key value stores, but its rectangular, latitude skewed cells create trade offs that alternative systems address differently. The core benefit is prefix locality enabling ordered log N plus K range scans with compact 8 byte storage, routinely delivering tens of thousands of queries per second per node at single digit millisecond latencies when the working set fits in memory. The cost is distortion: cells shrink in longitude as latitude increases, are not equal area, and have Z order curve discontinuities at boundaries requiring neighbor expansion.
Uber switched to H3 hexagonal hierarchical indexing for dispatch, pricing, and heatmaps specifically to address geohash limitations. H3 provides more uniform adjacency (each hex has exactly 6 neighbors versus geohash rectangles with 8), near equal area cells globally, and better polygon coverage with fewer cells. For ride supply and demand balancing, equal area cells ensure fair density metrics regardless of latitude, critical for global pricing algorithms. The trade off is complexity: H3 requires custom spatial libraries and slightly more expensive cell computations, though still sub microsecond per operation.
Google uses S2 spherical geometry with Hilbert curve mapping for Maps and location services. S2 provides rigorous equal area cells on a sphere, robust region coverings, and handles poles and anti meridian correctly by design. S2 excels at complex polygon intersection and hierarchical region queries, making it ideal for map rendering, geocoding, and route planning. The cost is higher implementation complexity and larger library footprint; S2 cell IDs are 64 bits like geohash but cell operations are more compute intensive. For simple point proximity in a distributed key value store, geohash remains competitive on throughput and latency.
Choose geohash when you need robust, high throughput proximity filtering with simple range scans, especially for point data and bounding box queries where 10 to 50 percent overfetch is acceptable. Prefer H3 for equal area aggregations, uniform density metrics, or when hexagonal topology better matches your domain (e.g., coverage planning, demand heatmaps). Use S2 for exact geometric operations, complex polygon queries, or when global correctness at poles and date line is critical and you can absorb the library and compute cost.
💡 Key Takeaways
•Geohash strengths: simplicity, compact 8 byte storage, ordered range scans achieving 40,000 plus queries per second per node with single digit millisecond p50 latency, good enough proximity with 10 to 50 percent overfetch at proper precision
•Geohash weaknesses: rectangular cells shrink in longitude at high latitudes (not equal area), Z order discontinuities require neighbor expansion, 10 to 100 percent overfetch for circular queries, fixed global precision steps
•H3 advantages over geohash: hexagonal cells with uniform 6 neighbor adjacency, near equal area globally for fair density metrics, better polygon coverage. Uber uses H3 for dispatch and pricing to ensure equal area supply and demand calculations regardless of latitude
•S2 advantages: spherical equal area cells, Hilbert curve for better locality than Z order, robust handling of poles and anti meridian, superior polygon intersection. Google uses S2 for Maps, geocoding, and routing where geometric correctness is critical
•Decision heuristic: use geohash for high throughput point proximity in key value stores when modest overfetch is acceptable. Use H3 for equal area aggregations and hexagonal topology needs. Use S2 for exact geometry, complex polygons, or global correctness requirements
📌 Examples
Ride sharing density calculation: Geohash precision 7 cells near equator are 0.093 square kilometers but 0.02 square kilometers at 60 degrees latitude. Driver density metrics skew by 4 times causing incorrect surge pricing at high latitudes. H3 level 8 cells are consistently 0.73 square kilometers globally, enabling fair density comparisons worldwide.
Map tile rendering: Google S2 uses hierarchical cell coverings to render map regions at different zoom levels. A country boundary polygon at zoom 6 is covered by 200 to 500 S2 cells with minimal overfetch (under 5 percent). Equivalent geohash covering requires 2 to 3 times more cells due to rectangular mismatch with irregular borders.
High throughput point query: DynamoDB backed service indexes 500 million user locations by 64 bit geohash integer. Proximity query scans 9 geohash ranges (center plus 8 neighbors) via parallel partition queries, fetches 600 candidates in 4 to 10 milliseconds, filters to 300 results. Achieves 50,000 queries per second with p99 under 20 milliseconds. Simpler and faster than H3 or S2 for this pure point proximity workload.