Enter Confusion Counts Per Threshold
Example Data Table
| Threshold | TP | FP | FN | Precision | Recall |
|---|---|---|---|---|---|
| 0.90 | 42 | 8 | 18 | 0.8400 | 0.7000 |
| 0.70 | 63 | 24 | 10 | 0.7241 | 0.8630 |
| 0.50 | 72 | 41 | 5 | 0.6372 | 0.9351 |
Formula Used
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
- F1 = 2 × Precision × Recall / (Precision + Recall)
- Micro average uses totals across all rows.
- Macro average is the mean of per-row metrics.
How to Use This Calculator
- Decide what each row represents, such as a threshold.
- Enter TP, FP, and FN for every row.
- Click submit to generate the table and curve.
- Review micro and macro metrics for reporting needs.
- Export CSV for analysis or PDF for sharing.
Precision and recall support different operational goals
In screening tasks, false negatives can be expensive. High recall reduces misses, even if precision drops. In moderation or fraud, false positives hurt users and costs. High precision limits unnecessary actions. This calculator lets you compare these priorities across multiple thresholds.
Threshold choice changes the confusion counts predictably
When you raise a decision threshold, fewer items are predicted positive. FP often falls, so precision rises. At the same time, FN can rise, so recall declines. For example, TP=42, FP=8 gives precision 0.84. If TP increases to 72 with FP=41, precision becomes 0.637, but recall improves strongly.
A precision–recall curve summarizes the trade‑off
Plotting recall on the x-axis and precision on the y-axis creates a curve of operating points. Points closer to the top-right indicate better overall performance. Use the interactive chart to spot thresholds that deliver acceptable recall without severe precision loss.
Micro and macro averages answer different reporting questions
Micro averaging pools TP, FP, and FN across rows, weighting larger rows more heavily. Macro averaging treats each row equally, highlighting consistency across thresholds or folds. If one segment dominates volume, micro metrics may look stronger than macro metrics.
Imbalanced datasets can mislead without context
With rare positives, accuracy can remain high even when recall is poor. Precision–recall metrics focus on positive detection quality. Track prevalence alongside TP, FP, and FN so stakeholders understand how many positives are expected.
Use the table to choose operating points and communicate risk
Identify thresholds that meet a target recall, then check precision and F1 for stability. Document the chosen label, counts, and averages. Exported CSV supports audits, while PDF helps cross‑team reviews and approvals.
FAQs
1) What if TP + FP equals zero?
Precision is undefined when no positives are predicted. The calculator shows a dash and excludes that row from macro precision and macro F1.
2) What if TP + FN equals zero?
Recall is undefined when there are no actual positives in that row. The calculator displays a dash and excludes that row from macro recall and macro F1.
3) Why is micro F1 different from macro F1?
Micro F1 is computed from pooled totals, so large rows dominate. Macro F1 averages per-row F1, so every row contributes equally to the final number.
4) Can I use this for k-fold validation?
Yes. Use each row as a fold. Macro metrics help gauge consistency across folds, while micro metrics summarize pooled performance across all examples.
5) Does a higher F1 always mean a better threshold?
Not always. F1 balances precision and recall equally, but your application may favor one. Choose thresholds based on costs, constraints, and required recall or precision targets.
6) How should I label rows?
Use meaningful labels such as probability thresholds, model versions, segments, or time windows. Labels appear in exports and help stakeholders interpret the curve and table quickly.