Distributed Sharding and Scaling Quadtrees to Billions of Points
Sharding by Region
Divide world into coarse regions. Each region is a shard containing a quadtree of that area. A query for Manhattan goes to the NYC shard. A query for London goes to the London shard. Each shard is independent, enabling parallel scaling.
Region boundaries cause problems. A radius query near a boundary needs data from multiple shards. The coordinator must fan out to both shards, merge results, and deduplicate. Boundary queries are slower than interior queries.
Cell-Based Sharding
Assign quadtree cells to shards using consistent hashing or range partitioning. Fine-grained cells distribute load more evenly than coarse regions. Hot cells can be split across multiple shards with replication.
Cell assignment should balance load. Naive assignment puts all downtown cells on one shard. Randomized or hash-based assignment spreads load but loses locality. Hybrid approaches: group nearby cells, then hash groups to shards.
Consistency Challenges
Distributed quadtrees face consistency issues. A point moves from cell A on shard 1 to cell B on shard 2. During transition, the point might appear in both, neither, or be stale. Strong consistency requires distributed transactions or coordination.
Eventual consistency is often acceptable for location data. A rider position being slightly stale rarely matters. Design for tolerance: queries accept that results may be seconds old. Critical operations (pickup confirmation) use separate strongly consistent path.