Mean Average Precision Calculator

Analyze ranked detections with class wise average precision. Compare thresholds, confidence ordering, and retrieval balance. Turn complex evaluation outputs into clear, actionable performance insights.

Calculator

Enter class counts and ranked detections

Use one class count per line and one detection per line. Detections should already reflect your IoU matching decision for the chosen threshold.

Choose the integration style used for average precision.
Stored in the report so your evaluation context stays clear.
Detections below this score are removed before ranking.
Useful for capped evaluation settings such as top K predictions.
Controls number formatting in the result cards and exports.
Enable only if zero target classes should still appear in the mean.
Accepted examples: cat=5, dog:4, or person,6.
Use class|score|tp. Example: cat|0.95|1. TP is 1 for a matched detection and 0 for a false positive.
Reset example data
Example data table

Sample ranked detections for three classes

Class Ground Truth Objects Top Scores TP Flags Use Case
cat 5 0.99, 0.95, 0.91, 0.88 1, 1, 0, 1 Strong ranking with one early false positive.
dog 4 0.97, 0.92, 0.90, 0.83 1, 0, 1, 1 Mixed ordering that lowers precision at early ranks.
person 6 0.98, 0.95, 0.90, 0.86 1, 1, 0, 1 High recall class with moderate rank noise.

Paste the example values into the form, change thresholds, and compare how AP and mAP move under stricter filtering.

Formula used

How average precision and mAP are calculated

For each class, detections are sorted from highest confidence to lowest confidence. At each rank, the calculator updates cumulative true positives and false positives, then computes precision and recall.

Choose all points interpolation for a smooth precision envelope or use 11 point interpolation for a classic benchmark style estimate.

How to use

Steps for evaluating your model output

  1. Count the ground truth objects for each class in your validation set.
  2. Match each predicted box or item to a target using your chosen IoU rule.
  3. Mark every ranked prediction as 1 for true positive or 0 for false positive.
  4. Paste class counts into the left box and ranked detections into the right box.
  5. Set the AP method, IoU threshold label, confidence filter, and optional top K cap.
  6. Submit the form and review macro mAP, weighted mAP, micro metrics, and class AP values.
  7. Use the CSV button for spreadsheets or the PDF button for a shareable report.
Why this helps

What this calculator is designed to reveal

This tool helps you inspect whether model quality drops because of poor ranking, weak recall, class imbalance, or aggressive confidence filtering. Macro mAP highlights class fairness, while weighted mAP shows how performance looks when frequent classes carry more influence.

Because the form accepts ranked detections directly, it works for object detection, retrieval tasks, and other ranked relevance pipelines where average precision is the main evaluation target.

FAQs

Common questions

1. What does mean average precision measure?

It measures how well a model ranks correct detections ahead of incorrect ones across classes. Higher mAP means better ranking quality and stronger class level retrieval performance.

2. Why are precision and recall both needed?

Precision shows how many reported detections are correct. Recall shows how many true targets were found. AP combines both by tracing performance across the ranked list.

3. When should I use all points interpolation?

Use all points interpolation when you want a fuller estimate of the precision recall curve. It is common in modern evaluation pipelines and usually reflects ranking detail better.

4. What is the difference between macro and weighted mAP?

Macro mAP treats every class equally. Weighted mAP gives larger classes more influence by multiplying each class AP by its ground truth count before averaging.

5. Why might a class be skipped?

A class is skipped when its ground truth count is zero and the zero target option is off. This avoids distorting the mean with undefined recall situations.

6. Can this tool evaluate retrieval style systems too?

Yes. If you can mark each ranked item as relevant or not relevant, the same average precision logic works for retrieval, ranking, and recommendation evaluations.

7. What does the confidence threshold change?

It removes low score detections before ranking. This can improve precision if weak predictions are noisy, but it may reduce recall if useful detections are filtered out.

8. Why include a maximum detections setting?

Some benchmarks cap predictions per class or per image. The top K option lets you simulate those rules and compare how restricted ranking changes AP.

Related Calculators

context recallmean reciprocal rankretrieval latencyretriever recallZero results rate

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.