Computer Vision Systems • Evaluation (mAP, IoU, Precision-Recall)Easy⏱️ ~2 min
Understanding Precision, Recall, and the Precision Recall Curve
Object detection uses precision and recall instead of accuracy because the number of true negatives is enormous and uninformative. Precision measures what fraction of your predictions are correct: True Positives (TP) divided by TP plus False Positives (FP). Recall measures what fraction of actual objects you found: TP divided by TP plus False Negatives (FN). If your model predicts 100 boxes and 80 are correct, precision is 0.80. If there were 90 actual objects, recall is 80 divided by 90, which equals 0.89.
Every prediction has a confidence score. By varying the threshold, you create different operating points. High thresholds give high precision but low recall because you only keep very confident predictions. Low thresholds give high recall but low precision because you accept more mistakes. The precision recall curve plots these trade offs by ranking all predictions by score and computing precision and recall at each point as you walk down the list.
Production systems pick thresholds based on business needs. Meta's content moderation might target 98% precision to minimize false takedowns, accepting lower recall. Amazon's warehouse counting might target 95% recall to avoid missing products, then filter duplicates downstream. Google's ads safety likely optimizes for precision to reduce human review cost while maintaining acceptable coverage.
💡 Key Takeaways
•Precision equals TP divided by (TP plus FP), measuring what fraction of predictions are correct, critical when false positives are expensive
•Recall equals TP divided by (TP plus FN), measuring what fraction of actual objects are found, critical when missing objects has high cost
•Accuracy is meaningless in object detection because true negatives (all positions that are not objects) dominate the calculation and provide no insight
•Precision recall curves trace the trade off by ranking predictions by confidence score and computing metrics at each threshold as you accept more predictions
•Operating point selection depends on business impact: content moderation targets 98% precision at 60% recall, warehouse counting targets 95% recall at 70% precision
📌 Examples
Meta content moderation: Sets threshold for 98% precision on harmful content classes to minimize false takedowns, measuring resulting recall of 60 to 70% as acceptable coverage
Amazon warehouse product counting: Targets 95% recall at 70% precision to avoid missing inventory, then uses tracking and spatial logic downstream to suppress duplicate counts
Tesla pedestrian detection: Optimizes for high recall (95%+) at acceptable precision (85%), then uses temporal tracking over multiple frames to filter false positives for safety