Computer Vision Systems • Evaluation (mAP, IoU, Precision-Recall)Easy⏱️ ~2 min
What is Intersection over Union (IoU) and Why Does It Matter?
Intersection over Union (IoU) quantifies how well a predicted bounding box overlaps with the ground truth. It's computed as the area where both boxes overlap divided by the total area covered by either box. An IoU of 1.0 means perfect alignment, while 0.0 means no overlap at all.
In practice, object detection systems use IoU thresholds to decide whether a prediction counts as correct. A prediction must exceed the IoU threshold AND have the correct class label to be counted as a true positive. PASCAL VOC traditionally uses IoU 0.5 as the standard, meaning boxes need at least 50% overlap. COCO uses stricter thresholds, averaging results from IoU 0.50 to 0.95 in steps of 0.05.
The threshold choice dramatically affects what counts as success. At IoU 0.5, a box covering roughly the right region passes. At IoU 0.75, the box must be tightly aligned. Tesla's autonomous driving systems likely require high IoU thresholds because precise localization feeds control decisions for steering and braking. In contrast, Google's image search might use IoU 0.5 since rough object presence matters more than pixel perfect boundaries.
IoU becomes harsh on small objects. A 2 pixel shift on a 16 by 16 pixel object can drop IoU from 0.7 to 0.4, while the same shift on a 200 by 200 object barely changes IoU. This is why COCO reports separate metrics for small, medium, and large objects, exposing models that work well on large cars but fail on distant pedestrians.
💡 Key Takeaways
•IoU measures localization quality by computing overlap area divided by union area, with values from 0.0 (no overlap) to 1.0 (perfect match)
•A detection counts as correct only when IoU exceeds the threshold AND the class label matches, enforcing both localization and classification accuracy
•IoU 0.5 is standard for PASCAL VOC and suits tasks like image search where rough location suffices, while IoU 0.75 or higher is needed for robotics or autonomous driving
•Small objects suffer disproportionately from IoU sensitivity: a 2 pixel error on a 16x16 box can drop IoU from 0.7 to 0.4, explaining why COCO separates metrics by object size
•COCO averages AP across IoU 0.50 to 0.95 in 0.05 steps, heavily penalizing poor localization and producing AP scores 15 to 30 points lower than single threshold metrics
📌 Examples
Tesla autonomous driving: Uses high IoU thresholds (likely 0.75+) for pedestrian and vehicle detection because precise bounding boxes feed trajectory planning and braking decisions
Google image search: Uses IoU 0.5 for object indexing because rough object presence matters more than pixel perfect boundaries for retrieval relevance
Amazon warehouse robots: Require IoU 0.75+ for box and product detection to compute accurate pick points for robotic arms, where 5cm errors cause grasp failures