Model Validation Metrics Calculator

Calculator Inputs

Validation Mode

Actual Class Labels

Use commas, semicolons, pipes, or line breaks.

Predicted Class Labels

Leave empty to derive labels from probabilities.

Positive Class Probabilities

Optional. Used for ROC AUC, log loss, and Brier score.

Positive Label

Probability Threshold

Classification Notes

This mode supports binary classification. Metrics include confusion counts, threshold-based scores, ranking quality, and calibration-sensitive losses.

Actual Values

Use the same count as predicted values.

Predicted Values

Predictor Count

Used for adjusted R squared.

Regression Notes

This mode measures fit, scale of error, directional bias, percentage error, residual spread, and correlation between actual and predicted values.

Example Data Table

Use the class columns for classification mode. Use the value columns for regression mode.

Record	Actual Class	Predicted Class	Positive Probability	Actual Value	Predicted Value
1	1	1	0.93	120	118
2	0	0	0.12	134	131
3	1	1	0.88	128	130
4	1	0	0.47	141	140
5	0	0	0.21	150	149
6	1	1	0.90	162	160
7	0	0	0.18	158	161
8	1	1	0.79	170	168

Formula Used

Classification Metrics

Accuracy = (TP + TN) / N
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
Specificity = TN / (TN + FP)
F1 Score = 2 × Precision × Recall / (Precision + Recall)
Balanced Accuracy = (Recall + Specificity) / 2
MCC = (TP×TN − FP×FN) / √[(TP+FP)(TP+FN)(TN+FP)(TN+FN)]
Log Loss = −mean[y log(p) + (1−y) log(1−p)]
Brier Score = mean[(p − y)²]
ROC AUC = area under the ROC curve from threshold ranking.

Regression Metrics

Error = Predicted − Actual
MAE = mean(|Error|)
MSE = mean(Error²)
RMSE = √MSE
MAPE = mean(|Error / Actual|)
sMAPE = mean[2|Error| / (|Actual| + |Predicted|)]
R² = 1 − SS_res / SS_tot
Adjusted R² = 1 − (1−R²)(n−1)/(n−p−1)
Explained Variance = 1 − Var(Error) / Var(Actual)

How to Use This Calculator

Select Classification or Regression mode.
Paste your actual values and predicted outputs in matching order.
For classification, add probabilities to evaluate ranking and calibration metrics.
Set the positive label and decision threshold for binary classification.
For regression, enter the predictor count if you want adjusted R squared.
Click Calculate Metrics to show results above the form.
Review the table and chart, then export the summary as CSV or PDF.

FAQs

1. What does this calculator measure?

It evaluates binary classification and regression models. You can measure discrimination, calibration-sensitive loss, residual error, fit quality, and prediction bias from pasted datasets.

2. Can I use probabilities without predicted labels?

Yes. Leave predicted class labels empty, provide positive probabilities, and set a threshold. The tool will derive predicted classes and still calculate ROC AUC, log loss, and Brier score.

3. Is this calculator suitable for multiclass tasks?

This version is optimized for binary classification. Multiclass data should be converted into one-vs-rest views or evaluated with a separate macro and micro averaging workflow.

4. When should I trust accuracy less?

Accuracy can mislead when classes are imbalanced. In those cases, use precision, recall, specificity, balanced accuracy, MCC, and probability-based scores for a fuller picture.

5. Why does adjusted R squared need predictor count?

Adjusted R squared penalizes unnecessary complexity. It uses the number of predictors so you can compare models with different feature counts more fairly.

6. What if my actual values include zeros?

MAPE becomes unstable around zero. The calculator skips zero actual values for MAPE, while RMSE, MAE, bias, and sMAPE still help you assess performance.

7. Which metric is best for comparing classifiers?

No single metric is always best. Use F1 for balance, MCC for robust class comparison, ROC AUC for ranking quality, and log loss for probability quality.

8. Why export the results?

Exports help with reporting, audit trails, model reviews, and team sharing. They also make it easier to document validation decisions across experiments and deployment checks.