Geospatial & Location ServicesProximity SearchMedium⏱️ ~2 min

Precision vs Candidate Set Size Tradeoff

The Core Trade-off

Coarse precision: large cells, few cells to query, many candidates per cell. You fetch more data than needed and filter heavily in phase 2. Network and memory costs are high but query count is low.

Fine precision: small cells, many cells to query, few candidates per cell. You fetch close to what you need but execute many queries. Query overhead is high but data transfer is efficient.

Finding the Sweet Spot

Optimal precision depends on query radius and data density. For 1 km radius queries in a dense city, 6-character geohash (1 km cells) means querying about 9 cells containing maybe 1000 candidates total. Acceptable.

Same 6-character geohash for 100 m radius queries fetches 9 cells when you only need a fraction of one. Over-fetch is 100x. Use 7-character (150 m cells) instead. Nine cells fetch maybe 100 candidates. Much better.

For 10 km radius queries, 6-character cells means hundreds of cells to query. Use 5-character (5 km cells) instead. Nine cells cover the area with reasonable candidate count.

Dynamic Precision

If query radius varies, store multiple precision levels per point. Index both geohash_5 and geohash_7 columns. Query planner selects appropriate index based on radius. Alternatively, compute cell covering at query time for the specific radius.

The storage cost is a few extra bytes per point. The benefit is optimal query performance across all radius sizes. For systems with fixed radius (e.g., always 5 km for ride matching), single precision suffices.

✅ Best Practice: Match cell size to typical query radius. Cell edge should be roughly equal to query radius. This results in 9 to 25 cells per query with reasonable candidate counts. Benchmark with real data to find optimal precision.
💡 Key Takeaways
Coarse precision: few queries, many candidates; fine precision: many queries, few candidates
Match cell size to query radius: cell edge roughly equals radius
9 to 25 cells per query is typical target for balanced performance
Variable radius queries benefit from multiple precision indexes
Benchmark with real data distribution; uniform test data misleads
📌 Interview Tips
1Calculate trade-off: 1 km radius with 1 km cells means 9 cells, maybe 1000 candidates. Same radius with 100 m cells means 100+ cells
2For variable radius, recommend dual indexing: coarse for wide queries, fine for narrow queries
3Explain why cell size should match radius: too small means too many queries, too large means too much filtering
← Back to Proximity Search Overview
Precision vs Candidate Set Size Tradeoff | Proximity Search - System Overflow