Calculator
Example data table
| Recall | Precision | Comment |
|---|---|---|
| 0.00 | 1.00 | Start point (no predicted positives). |
| 0.25 | 0.92 | High precision at low recall. |
| 0.50 | 0.80 | Balanced region. |
| 0.75 | 0.60 | Recall improves, precision drops. |
| 1.00 | 0.42 | End point (all positives retrieved). |
Formula used
1) Trapezoidal AUC (linear interpolation)
Given ordered points (R_i, P_i) where R is recall and P is precision, the area is: AUC = Σ (R_i − R_{i−1}) × (P_i + P_{i−1}) / 2.
2) Average Precision (AP)
We first compute a precision envelope: P̂_i = max_{j≥i} P_j. Then: AP = Σ (R_i − R_{i−1}) × P̂_i. This matches the common step-wise PR integration used for model evaluation.
How to use this calculator
- Select an input mode: PR pairs, or scores + labels.
- Paste your data or upload a CSV file.
- Optionally enable clipping to keep values within range.
- Press Submit to compute AUC and AP instantly.
- Use the download buttons to export CSV or PDF reports.
FAQs
1) What is area under the PR curve used for?
It summarizes precision–recall performance across thresholds. It is especially useful for imbalanced datasets where ROC AUC can look optimistic.
2) What is the difference between AP and trapezoidal AUC?
Trapezoidal AUC linearly connects PR points. Average Precision uses a precision envelope and step-wise integration, often matching common ML evaluation libraries.
3) Which one should I report in papers or dashboards?
If you want standard ranking performance, AP is widely reported. If you already have a smooth curve and want linear integration, use trapezoidal AUC.
4) Why does precision sometimes increase after threshold changes?
When you raise a threshold, you may drop more false positives than true positives, improving precision even though recall may decrease or stay similar.
5) Do I need PR points, or can I use raw model scores?
You can do either. Paste PR points if you already computed them, or use scores + labels to have this tool build the curve automatically.
6) What label should be considered positive?
Set the positive label value field to match your data. Many datasets use 1 for positive and 0 for negative, but other encodings are common.
7) Why are results slightly different from another tool?
Differences usually come from interpolation rules, point ordering, duplicate recalls, or whether a precision envelope is applied. Try AP for closest common behavior.