Data Validation and Anomaly Detection
Common Data Quality Issues
GPS drift: Device reports position 100 meters from actual location. Caused by poor signal, urban canyons, indoor positioning. A parked driver appears to jump around.
Teleportation: Position jumps impossibly far between updates. Device switched from GPS to cell tower estimation. Or replay attack sending old cached positions.
Impossible speed: Two updates 5 seconds apart show 10 km distance. That is 7200 km/h. GPS glitch or fraudulent data.
Validation Rules
Bounding box: Position must be within service area. Reject coordinates in the ocean or outside operating region.
Speed limit: Distance between consecutive positions divided by time interval must be plausible. Maximum 200 km/h for cars, 50 km/h for bikes, 10 km/h for walking.
Accuracy threshold: GPS reports accuracy estimate. Reject positions with accuracy worse than 100 meters. Or store but flag as low confidence.
Anomaly Detection
Kalman filtering: Smooth position estimates using physics model. Predict next position from velocity. Weight actual measurement against prediction. Reduces noise and catches implausible jumps.
Historical patterns: Learn typical behavior per entity. A driver who usually works downtown suddenly appearing at the airport might be legitimate or might be GPS spoof. Flag for review if pattern differs significantly.
Clustering: If many devices in an area report similar anomalies simultaneously, likely a GPS disruption event, not individual fraud. Handle differently than isolated anomalies.