Computer Vision Systems • Edge Deployment (MobileNet, EfficientNet-Lite)Hard⏱️ ~3 min
Real Time Edge Pipeline: From Sensor to Action in 33ms
A production edge vision pipeline has five stages that must execute within strict timing budgets. First, sensor capture acquires the frame from the camera. Second, preprocessing resizes and normalizes the image, converting color spaces and applying any letterbox padding. Third, model inference runs the neural network. Fourth, postprocessing applies operations like non maximum suppression for detection or softmax for classification. Fifth, the result triggers an action or updates the UI. For 30 frames per second real time operation, the total budget is 33ms.
Breaking down a typical mobile object detection app: preprocessing with vectorized image operations should complete in 1 to 3ms. Model inference on a device NPU for small backbones like SSD MobileNet or EfficientDet Lite0 takes 10 to 25ms. Postprocessing, primarily non maximum suppression for detection, should stay within 1 to 4ms. Any remaining time goes to telemetry, logging, and rendering. If any stage exceeds its budget, you either drop frames or the UI stutters.
Concrete device performance illustrates the hardware landscape. Raspberry Pi 4 with a Coral USB Tensor Processing Unit (TPU) runs SSD MobileNet V1 in 12ms at 0.10 milliwatt hours (mWh) per inference. Raspberry Pi 5 with TPU improves this to 10ms. Pure CPU execution on Pi 4 balloons to 209ms, making real time impossible. Jetson Orin Nano handles these light models in 20ms without extra accelerators. Modern flagship phones with NPUs achieve 5 to 15ms for classification at 224 by 224, but low end devices can be 8x slower, requiring adaptive compute strategies like dynamic resolution scaling or frame skipping.
💡 Key Takeaways
•Preprocessing must complete in 1 to 3ms using vectorized image operations; avoid per pixel loops and unnecessary color space conversions that add latency
•Model inference consumes the majority of the budget at 10 to 25ms on device accelerators; exceeding this causes frame drops and stuttering
•Non maximum suppression (NMS) can dominate postprocessing at 1 to 4ms for typical scenes, but scales quadratically with candidate boxes in crowded frames
•Device heterogeneity is extreme: flagship phones run 8x faster than low end devices, requiring adaptive compute like dynamic resolution or frame rate reduction
•Energy per inference ranges from 0.10 mWh on Raspberry Pi with TPU to 0.01 mWh on Jetson Orin Nano; continuous 30 fps operation draws 180 to 360mW for inference alone
📌 Examples
Mobile camera app running SSD MobileNet on phone NPU: 2ms preprocessing, 15ms inference, 3ms NMS, 13ms headroom for UI at 33ms total
Raspberry Pi 4 with Coral TPU running SSD MobileNet V1: 12ms inference at 0.10 mWh per frame enables battery powered real time detection
Low end Android device thermal throttling after 30 seconds: latency jumps from 20ms to 45ms, causing app to drop from 30 fps to 15 fps adaptively