Analyze ranked detections with class wise average precision. Compare thresholds, confidence ordering, and retrieval balance. Turn complex evaluation outputs into clear, actionable performance insights.
Use one class count per line and one detection per line. Detections should already reflect your IoU matching decision for the chosen threshold.
| Class | Ground Truth Objects | Top Scores | TP Flags | Use Case |
|---|---|---|---|---|
| cat | 5 | 0.99, 0.95, 0.91, 0.88 | 1, 1, 0, 1 | Strong ranking with one early false positive. |
| dog | 4 | 0.97, 0.92, 0.90, 0.83 | 1, 0, 1, 1 | Mixed ordering that lowers precision at early ranks. |
| person | 6 | 0.98, 0.95, 0.90, 0.86 | 1, 1, 0, 1 | High recall class with moderate rank noise. |
Paste the example values into the form, change thresholds, and compare how AP and mAP move under stricter filtering.
For each class, detections are sorted from highest confidence to lowest confidence. At each rank, the calculator updates cumulative true positives and false positives, then computes precision and recall.
Choose all points interpolation for a smooth precision envelope or use 11 point interpolation for a classic benchmark style estimate.
This tool helps you inspect whether model quality drops because of poor ranking, weak recall, class imbalance, or aggressive confidence filtering. Macro mAP highlights class fairness, while weighted mAP shows how performance looks when frequent classes carry more influence.
Because the form accepts ranked detections directly, it works for object detection, retrieval tasks, and other ranked relevance pipelines where average precision is the main evaluation target.
It measures how well a model ranks correct detections ahead of incorrect ones across classes. Higher mAP means better ranking quality and stronger class level retrieval performance.
Precision shows how many reported detections are correct. Recall shows how many true targets were found. AP combines both by tracing performance across the ranked list.
Use all points interpolation when you want a fuller estimate of the precision recall curve. It is common in modern evaluation pipelines and usually reflects ranking detail better.
Macro mAP treats every class equally. Weighted mAP gives larger classes more influence by multiplying each class AP by its ground truth count before averaging.
A class is skipped when its ground truth count is zero and the zero target option is off. This avoids distorting the mean with undefined recall situations.
Yes. If you can mark each ranked item as relevant or not relevant, the same average precision logic works for retrieval, ranking, and recommendation evaluations.
It removes low score detections before ranking. This can improve precision if weak predictions are noisy, but it may reduce recall if useful detections are filtered out.
Some benchmarks cap predictions per class or per image. The top K option lets you simulate those rules and compare how restricted ranking changes AP.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.