Geospatial & Location Services • GeohashingMedium⏱️ ~3 min
Prefix Binning Pattern and Precision Selection
Prefix binning is the production pattern of choosing a geohash precision so that each cell (bin) contains a target number of items, typically 10 to 100 objects per cell. This balances query performance against overfetch: too few items per cell means scanning many ranges for sufficient results, too many means expensive post filtering. For 100 million global points, 5 character geohash yields up to 33.6 million possible cells with average occupancy around 3 points per cell globally, but this is highly skewed with urban cells holding hundreds or thousands of items while rural cells remain empty.
The practical formula for precision selection starts with item density. If a city core has 5,000 items per square kilometer and you target 50 items per cell, you need cell area around 0.01 square kilometers. Precision 7 cells are roughly 0.093 square kilometers near the equator, still too large. Precision 8 cells at 0.023 to 0.04 square kilometers or precision 9 at 0.006 to 0.01 square kilometers are appropriate depending on latitude. Many systems dynamically adjust precision at query time: use coarser cells for broad searches (saving scan cost) and finer cells for tight radius queries (reducing overfetch).
Sharding by geohash prefix is another critical dimension. Use a short prefix like 3 to 5 characters as the partition key to distribute writes across nodes, then use the full geohash as the sort key within each partition to preserve scan locality. Monitor per partition item counts and split hot prefixes by increasing length when a partition exceeds capacity. This two level approach separates write distribution from query locality, enabling both horizontal scale and fast range scans.
💡 Key Takeaways
•Target 10 to 100 items per cell in dense regions to keep range scans small: with 50 items per cell and 9 cell scan (center plus 8 neighbors), you fetch roughly 450 candidates before distance filtering
•Precision selection formula: cell area should equal target items per cell divided by local item density. Urban density 5,000 items per square kilometer with target 50 items needs 0.01 square kilometer cells, requiring precision 8 or 9
•Dynamic precision at query time enables stable result counts: use coarser cells for 10 km radius searches (fewer ranges to scan) and finer cells for 500 meter searches (less overfetch to filter)
•Sharding pattern uses short prefix for partition key (3 to 5 characters for write distribution) and full geohash for sort key (preserving locality within partition). Monitor partition sizes and split hot prefixes when item counts exceed thresholds
•Non uniform density causes problems: global precision optimized for average leads to overfull urban cells (hundreds of items, slow scans) and empty rural cells (wasted index space)
📌 Examples
Ride sharing example: Downtown at lunch has 2,000 available drivers per square kilometer. Using precision 7 (0.093 square kilometers) gives ~186 drivers per cell, too many for fast scans. Switch to precision 8 (0.023 to 0.04 square kilometers) giving 46 to 80 drivers per cell. Proximity query scans 9 cells fetching ~400 to 700 candidates, post filters to ~200 within 500 meters. Scan completes in 2 to 5 milliseconds.
E commerce warehouse locations: 10,000 warehouses globally, very sparse. Precision 5 (4.9 km cells) gives average 0.3 warehouses per cell, requiring scanning hundreds of cells for any meaningful query. Use precision 4 (39 km cells) for coarse distribution, switch to precision 5 only for final filter in dense logistics corridors.
Sharding real numbers: 500 million user locations, shard by 4 character prefix. 32 to the power of 4 equals 1.05 million possible prefixes, use consistent hashing to map to 256 physical partitions, average 2 million items per partition. Full geohash sort key enables scanning ~100 to 1,000 items per query with locality.