Geospatial & Location ServicesQuadtree & Spatial IndexingHard⏱️ ~3 min

Implementation Details: Building a Production Quadtree Index

Data modeling starts with choosing a sphere aware hierarchical cell system like S2 for global use. Each object stores point geometry (latitude and longitude), hierarchical cell ID at one or more levels, and optionally a parent prefix for coarser queries. Maintain a secondary index sorted by cell ID so cell ranges are contiguous, enabling efficient range scans and deterministic shard routing. For 100 million points, a B-tree or LSM tree keyed by cell ID provides lookups in microseconds and range scans processing thousands of keys per millisecond. Leaf capacity and depth tuning directly impact performance. Start with capacity in the range of 64 to 512 points per leaf for point only workloads. For 100 million points with capacity 128, expect tree height approximately 10, roughly 780,000 leaves, and 260,000 internal nodes. These parameters keep pointer chasing low (10 hops worst case) and allow high cache residency (hundreds of MB). Dynamically adjust capacity based on observed density: double capacity in hotspot regions to reduce splits and depth. Enforce a maximum depth (like 12 or 15) to prevent runaway splitting in extreme density. Splitting strategy varies by workload. PR quadtree splits equally when leaf size exceeds capacity or depth is less than max depth; this is deterministic and fast, requiring only region boundaries. Point quadtree splits at the median of x or y coordinate alternately to balance subtrees under skewed data, which costs more CPU at split time (sorting to find median) but yields better depth in hotspots. In both variants, avoid infinite splitting by enforcing a minimum cell size in meters or degrees. For moving object workloads, PR quadtree with bounded depth and larger leaf capacity reduces update churn compared to point quadtree rebalancing. Caching and precomputation accelerate common queries. Precompute covers for frequent shapes like city polygons or geofences at multiple levels and store them. Cache neighbor tables per level (for a cell, store its 8 surrounding cells) to speed ring expansions in kNN queries. For map user interfaces, pre aggregate counts per cell at display zoom levels (classic tile pyramid) to avoid per point scans; this is how Google Maps and Bing Maps render density heatmaps at interactive speeds without querying raw points.
💡 Key Takeaways
Use sphere aware cell system like S2; store point geometry, hierarchical cell ID, and optional parent prefix; secondary index sorted by cell ID enables contiguous range scans and shard routing
Start with leaf capacity 64 to 512 for point workloads; 100M points at capacity 128 yields height 10, 780k leaves, 260k internal nodes in hundreds of MB; dynamically adjust capacity in hotspots
PR quadtree splits deterministically by region boundaries for fast writes; point quadtree splits by median for better balance under skew but higher CPU cost; enforce max depth to prevent runaway splitting
Precompute covers for common shapes (city polygons, geofences) at multiple levels; cache neighbor tables per level for fast kNN ring expansions; pre aggregate counts per cell for map UI heatmaps
kNN uses progressive ring expansion: cover initial radius, refine, expand neighbors if needed; k of 50 at urban density requires 1 to 3 rings with tens of cells, completing in under 10 ms
📌 Examples
Google Maps tile pyramid caches pre aggregated point counts per tile at each zoom level. Zoom 10 tile covers roughly 600 km² and displays aggregated density without scanning individual points, keeping map render under 100 ms.
A delivery service indexes 10 million restaurant locations with S2 level 15 cells (79,000 m² each). Leaf capacity of 256 yields height 8 and 39k leaves. In memory index in RocksDB provides range scans at 5,000 keys per ms, completing 2 km radius queries in 3 to 5 ms.
← Back to Quadtree & Spatial Indexing Overview