Computer Vision Systems • Edge Deployment (MobileNet, EfficientNet-Lite)Medium⏱️ ~2 min
EfficientNet Lite: Compound Scaling for Hardware Constrained Deployment
EfficientNet introduced compound scaling: instead of arbitrarily increasing depth, width, or resolution, it scales all three dimensions together using a fixed ratio derived from a grid search. The insight is that these dimensions are interdependent. Deeper networks need wider channels to avoid bottlenecks, and higher resolution inputs benefit from more layers to capture fine details. Compound scaling finds the optimal balance under a compute budget.
EfficientNet Lite variants simplify the base architecture for mobile and edge deployment. They remove operations that don't quantize well, like swish activations that have unbounded outputs, replacing them with ReLU6 that clips at 6 to keep activations bounded for 8 bit quantization. Lite models also remove squeeze and excitation layers in some configurations because the dynamic channel reweighting adds latency on certain mobile accelerators. The result is models that maintain most of the accuracy gains from compound scaling while running efficiently on real hardware.
For detection tasks, EfficientDet Lite builds on these backbones and adds a weighted bidirectional feature pyramid network. Traditional feature pyramids merge features from different resolutions in a top down or bottom up flow. The bidirectional approach allows features to flow both ways, weighted by learned parameters that indicate the importance of each connection. This improves multi scale fusion for objects of varying sizes with only modest compute overhead compared to single scale detection heads.
💡 Key Takeaways
•Compound scaling co tunes depth, width, and resolution using a fixed ratio to maximize accuracy under a compute budget, avoiding arbitrary scaling of single dimensions
•EfficientNet Lite replaces swish with ReLU6 and removes problematic layers to ensure models quantize cleanly to 8 bit without accuracy collapse
•EfficientDet Lite2 achieves 33 Mean Average Precision (mAP) on edge benchmarks but requires 139 to 188ms on Raspberry Pi with TPU versus 12ms for SSD MobileNet V1 at 19 mAP
•Weighted bidirectional feature pyramids in EfficientDet allow multi scale features to flow in both directions with learned importance weights for better object detection
•For 30 fps real time requirements, EfficientDet Lite0 on stronger accelerators or lower resolution inputs are necessary to stay within latency budgets
📌 Examples
EfficientNet Lite B0 runs classification in 15 to 20ms on phone NPUs at 224 by 224 with 75 percent top 1 accuracy, suitable for interactive mobile apps
EfficientDet Lite2 on Jetson Orin Nano executes detection in about 20ms at 320 by 320 resolution, enabling real time robotics applications
Google deploys EfficientNet Lite variants in Android for on device image understanding, achieving 10 to 15ms inference while keeping power under 200mW