Mean Average Precision Calculator

Calculator

Enter class counts and ranked detections

Use one class count per line and one detection per line. Detections should already reflect your IoU matching decision for the chosen threshold.

AP method

Choose the integration style used for average precision.

IoU threshold

Stored in the report so your evaluation context stays clear.

Confidence threshold

Detections below this score are removed before ranking.

Maximum detections per class

Useful for capped evaluation settings such as top K predictions.

Displayed decimals

Controls number formatting in the result cards and exports.

Include classes with zero ground truth

Enable only if zero target classes should still appear in the mean.

Ground truth counts

Accepted examples: cat=5, dog:4, or person,6.

Ranked detections

Use class|score|tp. Example: cat|0.95|1. TP is 1 for a matched detection and 0 for a false positive.

Reset example data

Example data table

Sample ranked detections for three classes

Class	Ground Truth Objects	Top Scores	TP Flags	Use Case
cat	5	0.99, 0.95, 0.91, 0.88	1, 1, 0, 1	Strong ranking with one early false positive.
dog	4	0.97, 0.92, 0.90, 0.83	1, 0, 1, 1	Mixed ordering that lowers precision at early ranks.
person	6	0.98, 0.95, 0.90, 0.86	1, 1, 0, 1	High recall class with moderate rank noise.

Paste the example values into the form, change thresholds, and compare how AP and mAP move under stricter filtering.

Formula used

How average precision and mAP are calculated

For each class, detections are sorted from highest confidence to lowest confidence. At each rank, the calculator updates cumulative true positives and false positives, then computes precision and recall.

Precision at rank k = TP_k / (TP_k + FP_k)
Recall at rank k = TP_k / Ground Truth Count
Average Precision = area under the precision recall curve for one class
Macro mAP = average of AP values across evaluated classes
Weighted mAP = sum(AP × class ground truth) / total ground truth
Micro F1 = 2 × Precision × Recall / (Precision + Recall)

Choose all points interpolation for a smooth precision envelope or use 11 point interpolation for a classic benchmark style estimate.

How to use

Steps for evaluating your model output

Count the ground truth objects for each class in your validation set.
Match each predicted box or item to a target using your chosen IoU rule.
Mark every ranked prediction as 1 for true positive or 0 for false positive.
Paste class counts into the left box and ranked detections into the right box.
Set the AP method, IoU threshold label, confidence filter, and optional top K cap.
Submit the form and review macro mAP, weighted mAP, micro metrics, and class AP values.
Use the CSV button for spreadsheets or the PDF button for a shareable report.

Why this helps

What this calculator is designed to reveal

This tool helps you inspect whether model quality drops because of poor ranking, weak recall, class imbalance, or aggressive confidence filtering. Macro mAP highlights class fairness, while weighted mAP shows how performance looks when frequent classes carry more influence.

Because the form accepts ranked detections directly, it works for object detection, retrieval tasks, and other ranked relevance pipelines where average precision is the main evaluation target.

FAQs

Common questions

1. What does mean average precision measure?

It measures how well a model ranks correct detections ahead of incorrect ones across classes. Higher mAP means better ranking quality and stronger class level retrieval performance.

2. Why are precision and recall both needed?

Precision shows how many reported detections are correct. Recall shows how many true targets were found. AP combines both by tracing performance across the ranked list.

3. When should I use all points interpolation?

Use all points interpolation when you want a fuller estimate of the precision recall curve. It is common in modern evaluation pipelines and usually reflects ranking detail better.

4. What is the difference between macro and weighted mAP?

Macro mAP treats every class equally. Weighted mAP gives larger classes more influence by multiplying each class AP by its ground truth count before averaging.

5. Why might a class be skipped?

A class is skipped when its ground truth count is zero and the zero target option is off. This avoids distorting the mean with undefined recall situations.

6. Can this tool evaluate retrieval style systems too?

Yes. If you can mark each ranked item as relevant or not relevant, the same average precision logic works for retrieval, ranking, and recommendation evaluations.

7. What does the confidence threshold change?

It removes low score detections before ranking. This can improve precision if weak predictions are noisy, but it may reduce recall if useful detections are filtered out.

8. Why include a maximum detections setting?

Some benchmarks cap predictions per class or per image. The top K option lets you simulate those rules and compare how restricted ranking changes AP.