What is Average Precision (AP) and Mean Average Precision (mAP)?

Average Precision and Mean Average Precision are the standard metrics for comparing detection models. Understanding how they work helps you interpret benchmark results and choose the right model for your application.
📊 AVERAGE PRECISION (AP): SINGLE-CLASS SUMMARYAverage Precision summarizes the entire PR curve into a single number. It measures the area under the Precision-Recall curve - the larger the area, the better the model performs across all operating points.
The calculation ranks all detections by confidence, then computes precision at each recall level where a new true positive appears. AP averages these precision values, weighting each by the increase in recall it represents.
Example: Imagine your model outputs 10 ranked detections for the "car" class. As you walk down this list, some are true positives (actual cars) and some are false positives. AP rewards a model that puts true positives at the top of this ranked list - high-confidence detections should be correct ones.
🎯 MEAN AVERAGE PRECISION (mAP): CROSS-CLASS SCOREmAP extends AP across multiple object classes. Calculate AP separately for each class (cars, pedestrians, trucks, etc.), then average them together.
mAP = (AP_cars + AP_pedestrians + AP_trucks + ...) / number_of_classes
This averaging treats all classes equally, regardless of how many instances each class has. A model that excels at detecting common objects but fails on rare ones will show it in the per-class AP breakdown.
⚙️ IOU THRESHOLDS IN MAPA detection counts as a true positive only if its IoU with the ground truth exceeds a threshold. Different thresholds test different precision levels:
[email protected]: Lenient threshold. Box needs 50% overlap to count as correct. Tests if the model roughly finds objects.
[email protected]: Strict threshold. Requires 75% overlap. Tests localization precision.
mAP@[0.5:0.95]: COCO standard. Averages mAP across thresholds from 0.5 to 0.95 in steps of 0.05. Comprehensive score that rewards both detection and precise localization.

💡 Key Insight: Two models with similar [email protected] can have very different [email protected] scores. The model with better localization maintains performance at stricter thresholds. Always check the full threshold range when comparing models.

💡 Key Takeaways

✓AP = area under PR curve; summarizes model performance across all confidence thresholds into one number

✓mAP averages AP across all classes, treating each class equally regardless of instance count

✓IoU threshold determines what counts as correct: 0.5 is lenient, 0.75 is strict, 0.5:0.95 is comprehensive

✓Per-class AP breakdown reveals class imbalance issues that mAP average can hide

📌 Interview Tips

1Interview Tip: When discussing benchmarks, always ask what IoU threshold was used - [email protected] vs [email protected] can differ by 20+ points for the same model

2Interview Tip: Explain that mAP rewards ranking quality - a model that puts true positives at the top of its confidence-sorted list scores higher

← Back to Evaluation (mAP, IoU, Precision-Recall) Overview