Computer Vision SystemsEdge Deployment (MobileNet, EfficientNet-Lite)Hard⏱️ ~3 min

Real Time Edge Pipeline: From Sensor to Action in 33ms

THE 33MS BUDGET

For 30 fps video processing, each frame must complete in 33ms. This includes: camera capture (2-5ms), preprocessing (2-3ms), model inference (15-25ms), postprocessing (2-5ms), and display/action (1-2ms). Any stage exceeding its budget causes frame drops, visible stuttering, or delayed responses.

PIPELINE PARALLELIZATION

Sequential processing wastes time: while the model runs on frame N, the camera sits idle. Pipeline parallelism overlaps stages: capture frame N+1 while processing frame N. With 3-stage pipelining (capture, inference, output), throughput approaches the slowest stage rather than the sum. A 25ms model can process 40 fps with proper pipelining instead of 30 fps sequential.

MEMORY MANAGEMENT

Mobile devices have limited memory bandwidth. Image preprocessing (resize, normalize) can bottleneck if done naively. Best practices: (1) Resize in hardware (GPU texture sampling) rather than CPU. (2) Keep buffers pinned to avoid allocation overhead. (3) Use zero-copy paths where camera output feeds directly to accelerator input.

✅ Best Practice: Profile memory bandwidth, not just compute. A model that fits in cache runs 2-3x faster than one that spills to main memory.

ACCELERATOR SELECTION

Mobile GPU: Best for floating point, 5-15 TOPS. NPU/DSP: Best for quantized models, 2-10 TOPS but more efficient. Edge TPU: Best for INT8, 4 TOPS with excellent power efficiency. Match your model format (FP16, INT8) to the accelerator strengths.

💡 Key Takeaways
33ms budget: capture (2-5ms) + preprocess (2-3ms) + inference (15-25ms) + postprocess (2-5ms) + output (1-2ms)
Pipeline parallelism: overlap stages so throughput = slowest stage, not sum; 25ms model achieves 40fps pipelined
Memory optimization: GPU resize, pinned buffers, zero-copy paths; cache-fitting models run 2-3x faster
Accelerator matching: mobile GPU for FP16, NPU/DSP for quantized, Edge TPU for INT8
📌 Interview Tips
1Break down the 33ms budget by stage: capture, preprocess, inference, postprocess, output
2Explain pipeline parallelism: capture N+1 while processing N achieves higher throughput
3Mention the memory bottleneck: cache-fitting models dramatically faster than memory-spilling ones
← Back to Edge Deployment (MobileNet, EfficientNet-Lite) Overview
Real Time Edge Pipeline: From Sensor to Action in 33ms | Edge Deployment (MobileNet, EfficientNet-Lite) - System Overflow