Distribution, Sharding, and Caching Strategies
Sharding by Cell
Use cell ID as shard key. Points in the same cell land on the same shard. A query for a cell goes to one shard. This is ideal for single-cell queries. But queries spanning multiple cells may hit multiple shards.
Cell-based sharding preserves locality: nearby points are often on the same shard. This enables single-shard queries for most local searches. The downside is hot shards for popular cells (city centers, airports).
Load Balancing Challenges
Spatial data is not uniform. Manhattan has 1000x more restaurants per square km than rural Montana. Cell-based sharding puts Manhattan on few shards that get most traffic. Solutions:
Sub-cell sharding: Split hot cells into finer cells across multiple shards. More complex routing but better distribution.
Composite keys: Shard by cell ID plus random suffix. Spreads cell data across shards but loses locality. Queries must fan out to all shards holding the cell.
Dedicated hot-cell shards: Identify hot cells, replicate their data to dedicated high-capacity shards.
Caching Strategies
Cache at cell level: key is cell ID, value is list of items in that cell. Queries check cache first. Cache hit avoids database entirely. Invalidate on item updates within the cell.
For nearest neighbor, caching is trickier. Same cell contents yield different results depending on query origin. Options: cache cell contents and compute distance client-side, or cache results for popular query points with TTL.