Model Accuracy Score Calculator

Enter confusion counts

Provide counts from a validation set. Use whole numbers whenever possible.

True Positives (TP)

Predicted positive and actually positive.

True Negatives (TN)

Predicted negative and actually negative.

False Positives (FP)

Predicted positive but actually negative.

False Negatives (FN)

Predicted negative but actually positive.

Rounding (decimals)

Choose how many decimals to show.

Display format

Percent

Decimal

Percent is easier for reporting.

New calculation

Example data table

This sample confusion matrix could come from 1,000 test records.

Scenario	TP	TN	FP	FN	Total
Binary classifier example	420	510	30	40	1000

Formula used

Accuracy score

Accuracy = (TP + TN) / (TP + TN + FP + FN)

This is the proportion of correct predictions across all samples.

Supporting metrics

Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
Specificity = TN / (TN + FP)
F1 = 2·Precision·Recall / (Precision + Recall)
Balanced Accuracy = (Recall + Specificity) / 2

When classes are imbalanced, report accuracy with recall, specificity, or MCC to avoid misleading conclusions.

How to use this calculator

Evaluate your model on a labeled validation or test dataset.
Count TP, TN, FP, and FN based on a chosen positive class.
Enter the counts above, choose rounding and display format.
Press Calculate to display results above the form.
Use CSV or PDF export to share metrics with stakeholders.

Accuracy as a baseline indicator

Accuracy reports the share of correct predictions across all cases. The calculator uses (TP+TN)/(TP+TN+FP+FN) and also returns the error rate, 1−accuracy. Treat this as a starting checkpoint rather than a final verdict, because identical accuracy can hide very different patterns of false alarms and missed detections, especially when the positive class is uncommon. Build TP, TN, FP, and FN from a held-out split, not reused training records. With larger samples, accuracy stabilizes; with small tests, add confidence intervals to communicate uncertainty clearly to decision makers.

Reading the confusion matrix correctly

A confusion matrix separates outcomes into TP, TN, FP, and FN. TP and FN depend on which label you define as “positive,” so document that choice in experiments and dashboards. If the evaluation set changes, keep totals and class proportions visible, since prevalence shifts can change precision and kappa even when recall stays stable. When you use cross-validation, aggregate counts across folds before computing the final rates.

Complementary rates for decision tradeoffs

Precision, TP/(TP+FP), measures how trustworthy positive predictions are. Recall, TP/(TP+FN), measures coverage of actual positives, while specificity, TN/(TN+FP), measures protection against false positives. Balanced accuracy averages recall and specificity, making it useful for imbalanced samples. Use these rates to explain threshold moves: higher recall often reduces precision. Review FPR and FNR to quantify which error type dominates operationally.

Robust summaries under imbalance

MCC combines all four counts into a single correlation-like score, (TP·TN−FP·FN)/sqrt((TP+FP)(TP+FN)(TN+FP)(TN+FN)). It stays informative when one class is rare and discourages gaming by predicting only the majority label. Cohen’s kappa adjusts observed accuracy by chance agreement from marginal rates, supporting fairer comparisons across datasets. If any denominator is zero, the calculator returns N/A to prevent misleading infinities.

Operational reporting and governance

For stakeholders, report the dataset size, the predicted positive rate, and the cost of FP versus FN alongside accuracy. Track metrics over time by model version and segment, and investigate abrupt changes with the same confusion-matrix inputs. Exporting CSV and PDF outputs supports audits, peer review, and reproducible evaluation records for regulated or high-risk deployments. Add a simple acceptance range, such as minimum recall, to align releases with policy requirements.

FAQs

1. What does the accuracy score represent?

It is the proportion of all predictions that are correct: (TP+TN) divided by total cases. It is intuitive, but it can look high when one class dominates, so check recall, specificity, and MCC as well.

2. When should I prefer balanced accuracy?

Use it when classes are imbalanced or when both classes matter equally. It averages sensitivity (recall) and specificity, so a model that ignores the minority class will score poorly even if overall accuracy is high.

3. Why can precision and recall move in opposite directions?

Changing the decision threshold usually increases one at the expense of the other. A more aggressive threshold predicts positives more often, raising recall but adding false positives that lower precision. Evaluate the tradeoff using business costs.

4. What is MCC and why is it helpful?

Matthews correlation coefficient summarizes TP, TN, FP, and FN into a single value between −1 and 1. It behaves well under imbalance and penalizes models that only predict the majority label.

5. What does Cohen’s kappa add beyond accuracy?

Kappa adjusts observed agreement for the agreement expected by chance from the class marginals. It is useful when prevalence varies between datasets, because it discourages inflated scores caused by always predicting the most common class.

6. Why do I see N/A for some metrics?

Some formulas divide by totals such as TP+FP or TN+FN. If that denominator is zero, the metric is undefined. The calculator shows N/A to avoid misleading values; review your counts or class definition.

Interpretation notes

High accuracy can hide poor detection of a rare class.
Precision matters when false alarms are costly.
Recall matters when missed positives are costly.
MCC and kappa help summarize performance under imbalance.