City Scale Video Analytics System Design

Scale Challenges
A city-scale deployment might have 10,000+ cameras. At 30 FPS each, that is 300,000 frames per second requiring analysis. No single server handles this load. The system must distribute work across many machines while maintaining latency guarantees.
Hierarchical Processing
Tier 1 - Edge processing: Simple models run on devices near cameras. Motion detection, basic filtering, frame selection. Reduces traffic to central servers by 90%+ by only sending interesting frames.
Tier 2 - Regional servers: Medium-complexity models process selected frames from clusters of cameras. Object detection, initial classification. Handle 100-1000 cameras each.
Tier 3 - Central analysis: Complex models for high-value analysis. Face recognition, behavior analysis, cross-camera tracking. Receives only high-priority frames from regional servers.
Data Flow Optimization
Metadata instead of video: Once objects are detected, send bounding boxes, class labels, and confidence scores rather than raw frames. Reduces bandwidth by 100-1000x compared to streaming full video.
Selective frame transmission: Only transmit frames containing events of interest. A parking lot camera might send 10 frames per hour instead of 108,000.
Coordination Challenges
Cross-camera tracking: Following an object across multiple cameras requires coordination. Regional servers maintain track databases. Hand-off protocols ensure continuity as objects move between camera coverage zones.
Time synchronization: All cameras must share a common time reference. 100ms clock drift between cameras makes cross-camera analysis unreliable.

💡 Key Takeaways

✓10,000 cameras at 30 FPS = 300,000 frames/second - requires distributed processing architecture

✓Three-tier hierarchy: edge (motion filter), regional (detection), central (complex analysis)

✓Metadata transmission reduces bandwidth 100-1000x vs streaming raw video

✓Time synchronization across cameras is critical - 100ms drift breaks cross-camera analysis

📌 Interview Tips

1Interview Tip: Start with scale numbers (cameras × FPS = total frames) to show you understand the magnitude

2Interview Tip: Explain hierarchical processing as progressive filtering - each tier reduces load for the next

← Back to Real-time Video Processing Overview