Learn→Computer Vision Systems→Edge Deployment (MobileNet, EfficientNet-Lite)→4 of 6

Computer Vision Systems • Edge Deployment (MobileNet, EfficientNet-Lite)Hard⏱️ ~3 min

Real Time Edge Pipeline: From Sensor to Action in 33ms

THE 33MS BUDGET
For 30 fps video processing, each frame must complete in 33ms. This includes: camera capture (2-5ms), preprocessing (2-3ms), model inference (15-25ms), postprocessing (2-5ms), and display/action (1-2ms). Any stage exceeding its budget causes frame drops, visible stuttering, or delayed responses.
PIPELINE PARALLELIZATION
Sequential processing wastes time: while the model runs on frame N, the camera sits idle. Pipeline parallelism overlaps stages: capture frame N+1 while processing frame N. With 3-stage pipelining (capture, inference, output), throughput approaches the slowest stage rather than the sum. A 25ms model can process 40 fps with proper pipelining instead of 30 fps sequential.
MEMORY MANAGEMENT
Mobile devices have limited memory bandwidth. Image preprocessing (resize, normalize) can bottleneck if done naively. Best practices: (1) Resize in hardware (GPU texture sampling) rather than CPU. (2) Keep buffers pinned to avoid allocation overhead. (3) Use zero-copy paths where camera output feeds directly to accelerator input.
✅ Best Practice: Profile memory bandwidth, not just compute. A model that fits in cache runs 2-3x faster than one that spills to main memory.
ACCELERATOR SELECTION
Mobile GPU: Best for floating point, 5-15 TOPS. NPU/DSP: Best for quantized models, 2-10 TOPS but more efficient. Edge TPU: Best for INT8, 4 TOPS with excellent power efficiency. Match your model format (FP16, INT8) to the accelerator strengths.

💡 Key Takeaways

✓33ms budget: capture (2-5ms) + preprocess (2-3ms) + inference (15-25ms) + postprocess (2-5ms) + output (1-2ms)

✓Pipeline parallelism: overlap stages so throughput = slowest stage, not sum; 25ms model achieves 40fps pipelined

✓Memory optimization: GPU resize, pinned buffers, zero-copy paths; cache-fitting models run 2-3x faster

✓Accelerator matching: mobile GPU for FP16, NPU/DSP for quantized, Edge TPU for INT8

📌 Interview Tips

1Break down the 33ms budget by stage: capture, preprocess, inference, postprocess, output

2Explain pipeline parallelism: capture N+1 while processing N achieves higher throughput

3Mention the memory bottleneck: cache-fitting models dramatically faster than memory-spilling ones

← Back to Edge Deployment (MobileNet, EfficientNet-Lite) Overview