City Scale Video Analytics System Design
Scale Challenges
A city-scale deployment might have 10,000+ cameras. At 30 FPS each, that is 300,000 frames per second requiring analysis. No single server handles this load. The system must distribute work across many machines while maintaining latency guarantees.
Hierarchical Processing
Tier 1 - Edge processing: Simple models run on devices near cameras. Motion detection, basic filtering, frame selection. Reduces traffic to central servers by 90%+ by only sending interesting frames.
Tier 2 - Regional servers: Medium-complexity models process selected frames from clusters of cameras. Object detection, initial classification. Handle 100-1000 cameras each.
Tier 3 - Central analysis: Complex models for high-value analysis. Face recognition, behavior analysis, cross-camera tracking. Receives only high-priority frames from regional servers.
Data Flow Optimization
Metadata instead of video: Once objects are detected, send bounding boxes, class labels, and confidence scores rather than raw frames. Reduces bandwidth by 100-1000x compared to streaming full video.
Selective frame transmission: Only transmit frames containing events of interest. A parking lot camera might send 10 frames per hour instead of 108,000.
Coordination Challenges
Cross-camera tracking: Following an object across multiple cameras requires coordination. Regional servers maintain track databases. Hand-off protocols ensure continuity as objects move between camera coverage zones.
Time synchronization: All cameras must share a common time reference. 100ms clock drift between cameras makes cross-camera analysis unreliable.