Computer Vision SystemsEdge Deployment (MobileNet, EfficientNet-Lite)Easy⏱️ ~2 min

What Makes Edge Deployment Different from Cloud Inference?

Definition
Edge deployment runs ML models directly on devices (phones, cameras, embedded systems) rather than sending data to cloud servers. This eliminates network latency, enables offline operation, and keeps sensitive data on-device.

WHY EDGE MATTERS

Cloud inference adds 50-200ms network round-trip latency. For real-time applications (autonomous driving, AR filters, robotics), this is unacceptable. A self-driving car traveling at 60 mph moves 5 feet during a 50ms network delay. Edge inference runs in 10-50ms with zero network dependency. Privacy is another driver: processing faces or voices locally means sensitive data never leaves the device.

EDGE CONSTRAINTS

Compute: Mobile CPUs run at 2-5 TOPS (trillion operations per second) versus 100+ TOPS for server GPUs. Memory: 2-8 GB RAM shared with OS and apps versus 32-80 GB on servers. Power: 5-15W total device power versus 300W for a GPU. Thermal: Sustained high compute causes throttling after 30-60 seconds on phones.

💡 Key Insight: Edge models must be 10-100x smaller than cloud models while maintaining acceptable accuracy. This drives the need for specialized architectures.

TYPICAL LATENCY BUDGETS

Real-time video (30 fps): 33ms per frame. AR/VR: 11ms (90 fps). Autonomous driving perception: 50-100ms end-to-end. Voice activation: 200-500ms acceptable. These tight budgets leave no room for network calls.

💡 Key Takeaways
Edge eliminates 50-200ms network latency, enabling real-time applications like autonomous driving and AR
Constraints: 2-5 TOPS compute (vs 100+ server), 2-8 GB RAM, 5-15W power, thermal throttling after 30-60s
Edge models must be 10-100x smaller than cloud models while maintaining acceptable accuracy
Latency budgets: 33ms for 30fps video, 11ms for AR/VR, 50-100ms for autonomous driving
📌 Interview Tips
1When explaining edge vs cloud, use the self-driving car example: 50ms delay = 5 feet of travel at 60 mph
2Describe the four constraints: compute (TOPS), memory (GB), power (watts), thermal (throttling)
3Mention privacy as a driver: sensitive data (faces, voices) never leaves the device
← Back to Edge Deployment (MobileNet, EfficientNet-Lite) Overview
What Makes Edge Deployment Different from Cloud Inference? | Edge Deployment (MobileNet, EfficientNet-Lite) - System Overflow