Geospatial Indexing: Grids, Curves, and Trees
Grid-Based Indexing
Divide space into fixed cells. Geohash uses rectangular cells encoded as strings. H3 uses hexagonal cells. Each point gets assigned to a cell by its coordinates. Store cell ID with each point and index that column.
Query: compute which cells overlap the search radius. Fetch all points from those cells. Advantages: simple implementation, works with standard databases, easy to shard by cell ID. Disadvantages: fixed cell size may not match query radius, edge effects at cell boundaries.
Space-Filling Curves
Map 2D coordinates to 1D values while preserving some locality. Z-order (geohash basis) and Hilbert curves are common. Points nearby in 2D tend to have nearby 1D values. This enables range queries on a single indexed column.
Hilbert curves have better locality than Z-order: fewer discontinuities where nearby points have distant values. S2 uses Hilbert curves on cube faces. The 1D value becomes a sortable cell ID suitable for B-tree indexing.
Tree-Based Indexing
Quadtrees: Adaptive subdivision. Dense areas get fine cells; sparse areas stay coarse. Good for non-uniform data. Typically in-memory.
R-trees: Group points into bounding rectangles. Rectangles can overlap. Optimized for disk access with page-sized nodes. Used in spatial databases like PostGIS. Supports polygons and complex shapes, not just points.
KD-trees: Binary space partition alternating dimensions. Good for exact nearest neighbor in memory. Less suited for range queries and disk storage.