EfficientNet Lite: Compound Scaling for Hardware Constrained Deployment
COMPOUND SCALING
Traditional scaling increases depth (more layers), width (more channels), or resolution independently. EfficientNet scales all three together using a compound coefficient φ: depth scales by 1.2^φ, width by 1.1^φ, resolution by 1.15^φ. This balanced approach achieves better accuracy per FLOP than scaling any dimension alone. EfficientNet-B0 through B7 represent increasing φ values.
EFFICIENTNET LITE MODIFICATIONS
Standard EfficientNet uses operations not well-supported on mobile hardware. EfficientNet-Lite makes three changes: (1) Replace Swish activation with ReLU6, which is 2-3x faster on mobile accelerators. (2) Remove squeeze-and-excite blocks, which require global pooling that is slow on edge TPUs. (3) Fix input resolution to avoid shape-dependent optimization issues. These changes sacrifice 1-2% accuracy for 40-60% faster inference on edge devices.
CHOOSING THE RIGHT VARIANT
EfficientNet-Lite0: 5ms on mobile GPU, 75% ImageNet top-1. Good for classification with tight latency.
EfficientNet-Lite2: 12ms, 77% accuracy. Balanced choice.
EfficientNet-Lite4: 30ms, 80% accuracy. Maximum accuracy when latency budget allows.
QUANTIZATION COMPATIBILITY
EfficientNet-Lite is designed for INT8 quantization. The simplified architecture (no squeeze-excite, ReLU6) quantizes cleanly with less than 1% accuracy loss, enabling 2-4x speedup on integer-only accelerators.