Geospatial & Location ServicesReal-time Location TrackingMedium⏱️ ~3 min

System Architecture: Core Components of Location Tracking

A production location tracking system consists of five distinct layers, each solving specific challenges at scale. The client layer runs on mobile devices with GPS sensors, managing battery life through intelligent batching: instead of sending every GPS reading immediately, devices can batch 3 to 5 updates and send them together, reducing network calls by 70% while maintaining perceived real-time experience. The ingestion layer handles massive write throughput using persistent WebSocket connections or HTTP/2 streams. Uber maintains approximately 5 million concurrent WebSocket connections during peak hours, requiring load balancers that support connection pooling and sticky sessions. Behind the load balancer, Apache Kafka acts as a buffer, absorbing traffic spikes: during events like New Year's Eve, ingestion can spike 5x to 10x normal rates. Kafka topics are partitioned by geographic region (city or metro area) to enable parallel processing and maintain ordering within regions. The processing layer uses stream processors like Apache Flink to transform raw GPS coordinates into geospatial indexes. Every location update gets converted to an S2 cell ID (30 bit encoding at level 16 provides roughly 1 kilometer cell sizes), validated for impossible movements (speed over 200 kilometers per hour indicates GPS glitch), and enriched with metadata like road snapping for drivers. This happens in under 100 milliseconds per update. Storage uses a tiered approach based on access patterns. Redis stores the last known location for each entity with 5 minute Time To Live (TTL), serving 99% of queries from memory with sub millisecond latency. TimescaleDB or InfluxDB stores historical trails for the past 7 days, supporting route replay features. S3 with Parquet format archives older data for analytics. The query layer provides APIs for point lookups, proximity searches (find all drivers within 2 kilometers), and real-time subscriptions where clients receive push notifications when tracked entities move.
💡 Key Takeaways
Client batching optimization: Sending 3 to 5 GPS readings together instead of individually reduces network calls by 70% and extends battery life by 40%, critical for mobile devices tracking location for hours
Kafka as shock absorber: Buffer handles traffic spikes during events (5x to 10x normal rates on New Year's Eve) by partitioning topics geographically, preventing downstream system overload
Tiered storage strategy: Redis (last 5 minutes) serves 99% of queries with sub 1 millisecond latency, TimescaleDB (7 days) for historical routes, S3 (long term) for analytics, reducing costs by 90% versus storing everything in memory
Connection management at scale: Uber maintains 5 million concurrent WebSocket connections requiring load balancers with sticky sessions and connection pooling to avoid constant reconnection overhead
Stream processing latency: Apache Flink converts raw GPS to S2 cell IDs, validates movements, and updates indexes within 100 milliseconds per update, using parallel processing across geographic partitions
📌 Examples
Mobile client code implementing intelligent batching: const batch = []; setInterval(() => { if (batch.length >= 3 || timeSinceLastSend > 10000) { sendLocationBatch(batch); batch.length = 0; }}, 3000); This sends either when 3 locations accumulated or 10 seconds elapsed
Redis key structure for last known locations: SET driver:12345:location "37.7749,-122.4194,1640000000" EX 300 where value contains latitude, longitude, timestamp with 5 minute expiration, automatically cleaning up stale data
Kafka topic partitioning strategy at Uber: locations-us-west-sf, locations-us-west-la, locations-us-east-nyc where each city gets dedicated partitions enabling parallel processing and geographic isolation of failures
← Back to Real-time Location Tracking Overview