Geospatial & Location ServicesReal-time Location TrackingHard⏱️ ~3 min

Geospatial Indexing: Geohash vs S2 vs QuadTree

Geospatial indexing solves the proximity search problem: given millions of entities with latitude/longitude coordinates, find all entities within a specific radius of a query point in under 50 milliseconds. Naive approaches using Haversine distance calculation on every entity require O(n) time, making them unusable at scale. Spatial indexes reduce this to O(log n) lookups by encoding coordinates into hierarchical structures. Geohash encodes lat/lon pairs into base32 strings where longer prefixes represent smaller areas. A geohash of length 6 ("9q8yy") represents roughly a 1.2 kilometer by 0.6 kilometer area. The elegant property: entities sharing longer prefixes are geographically closer, enabling prefix based searches in databases. However, Geohash suffers from edge discontinuities: two points on opposite sides of the Prime Meridian or Equator have completely different geohashes despite being close. Uber initially used Geohash but encountered issues matching riders to nearby drivers across cell boundaries, requiring searches of 8 adjacent cells adding 8x query overhead. Google's S2 geometry library maps the Earth onto a cube, then subdivides each face into a hierarchical grid of cells. An S2 cell at level 16 (30 bit encoding) covers approximately 1 square kilometer and cells maintain continuity across boundaries. Critically, S2 cells have a covering algorithm: you can find the minimum set of cells that cover any circle or polygon. To search within 2 kilometers, you generate cell coverings (typically 4 to 10 cells at mixed levels) and query only those cell IDs. This reduces the search space by 99.9% compared to scanning all entities. QuadTree recursively divides 2D space into four quadrants, building a tree where each node contains entities within its bounds. Querying involves traversing only branches that intersect your search radius. QuadTrees excel for in memory operations and dynamic datasets but struggle with persistence: rebalancing after insertions causes expensive tree restructuring. Google uses S2 for persistent indexes (billions of businesses in Maps) while applications like multiplayer games use QuadTrees for transient positions of players that only exist in memory.
💡 Key Takeaways
Geohash edge problem: Two points 100 meters apart across the Prime Meridian have completely different prefixes ("9" vs "d"), forcing systems to search 8 adjacent cells adding 8x query overhead at boundaries
S2 cell covering efficiency: A 2 kilometer radius search generates 4 to 10 cells at mixed levels that perfectly cover the circle, reducing search space from millions of entities to typically under 1000 candidates (99.9% reduction)
Performance comparison: Geohash prefix queries in PostgreSQL take 20 to 40 milliseconds for 1 million entities, S2 cell lookups with proper indexes take 5 to 15 milliseconds, QuadTree in memory traversal under 1 millisecond but requires full dataset in RAM
Storage cost tradeoff: S2 cell IDs are 64 bit integers (8 bytes) enabling compact indexes, Geohash strings are typically 8 to 12 bytes, QuadTrees require pointer overhead (24 to 32 bytes per node) making them 4x more memory intensive
Uber's migration: Switched from Geohash to S2 reducing average driver matching queries from 8 cells to 3 cells, cutting query time by 60% and database load by 50% during peak hours
📌 Examples
S2 cell covering code: coverer = S2RegionCoverer(); coverer.set_max_cells(10); covering = coverer.GetCovering(S2Cap.FromAxisAngle(center, 2km)); This generates minimum cell set covering 2km radius, typically 4 to 10 cells at levels 14 to 16
PostgreSQL with S2 index: CREATE INDEX idx_driver_location ON drivers (s2_cell_id); SELECT * FROM drivers WHERE s2_cell_id IN (cell_ids) AND ST_Distance(location, query_point) < 2000; First filters by cell (fast index scan) then precise distance check
Geohash boundary issue example: San Francisco query at lat 37.7749, lon -122.4194 (geohash "9q8yy") must also search adjacent cells "9q8yv", "9q8yw", "9q8yz" and 5 others to avoid missing drivers 50 meters away across cell boundary
← Back to Real-time Location Tracking Overview