Computer Vision Systems • Evaluation (mAP, IoU, Precision-Recall)Medium⏱️ ~2 min
What is Average Precision (AP) and Mean Average Precision (mAP)?
Average Precision (AP) summarizes the entire precision recall curve into a single number by computing the area under the curve. It measures how well the model ranks correct detections above incorrect ones while also localizing them properly. A model with AP of 0.80 means that on average across all recall levels, precision stays high, indicating strong ranking quality.
AP is computed by sorting all predictions by confidence score, then walking down the list to build a precision recall curve. At each unique recall level, precision is interpolated to the maximum value at that recall or higher. The area under this interpolated curve becomes AP. PASCAL VOC used 11 point interpolation (sampling at recall 0.0, 0.1, 0.2, up to 1.0), while modern implementations compute continuous area.
Mean Average Precision (mAP) extends AP by averaging across multiple dimensions. PASCAL VOC computed AP per class at IoU 0.5, then averaged across classes to get mAP. COCO computes AP at each IoU threshold from 0.50 to 0.95 in steps of 0.05, then averages these 10 values to get AP at [.5:.95], which is what people mean when they cite COCO mAP. State of the art models like Google's or Meta's two stage detectors achieve AP at [.5:.95] above 0.55 on COCO, with AP at 0.5 often exceeding 0.80.
The difference between AP at 0.5 and AP at [.5:.95] exposes localization quality. A model with AP at 0.5 of 0.75 but AP at [.5:.95] of 0.45 detects objects but localizes poorly. Production teams report both metrics plus per class AP and per size AP to understand failure modes. Tesla tracks AP separately for vulnerable road users in occlusion. Amazon might weight mAP by product SKU frequency to align with business impact rather than treating all classes equally.
💡 Key Takeaways
•AP is the area under the precision recall curve, measuring ranking quality and localization together in a single number from 0.0 to 1.0
•PASCAL VOC reports AP at IoU 0.5 using 11 point or continuous interpolation, typically yielding AP values of 0.70 to 0.80 for strong models
•COCO reports AP averaged over IoU thresholds 0.50 to 0.95 in 0.05 steps, producing AP values 15 to 30 points lower (0.45 to 0.55 for state of the art)
•mAP can mean macro average (equal weight per class) or weighted average by class frequency, with weighted mAP better reflecting business impact for skewed distributions
•Production systems report AP at 0.5, AP at 0.75, AP at [.5:.95], plus per class and per size AP to diagnose localization versus classification failures and small object weaknesses
📌 Examples
Google two stage detectors on COCO: AP@[.5:.95] of 0.55+, [email protected] of 0.80+, with 1000 images per second throughput on cloud TPUs for batch evaluation
Tesla object detection for autonomous driving: Tracks AP separately for vulnerable road users (pedestrians, cyclists) in occlusion scenarios, gating releases on 95%+ recall at IoU 0.75
Amazon retail vision system: Uses class weighted mAP reflecting SKU popularity, where top 100 products get 5x weight compared to long tail items, aligning metrics with revenue impact