Turn predictions into actionable model quality insight today. Compare regression and classification performance in minutes. Download metrics, datasets, and PDF summaries with one click.
Use this small set to validate your input formatting.
| # | Actual (y) | Predicted (ŷ) |
|---|---|---|
| 1 | 10 | 9.8 |
| 2 | 12 | 11.5 |
| 3 | 9 | 10.2 |
| 4 | 15 | 14.4 |
| 5 | 13 | 13.1 |
| # | Actual (0/1) | Probability (p) |
|---|---|---|
| 1 | 1 | 0.81 |
| 2 | 0 | 0.22 |
| 3 | 1 | 0.73 |
| 4 | 1 | 0.61 |
| 5 | 0 | 0.35 |
Note: Information criteria depend on your chosen parameter count.
Good fit starts with error size and explained variance. For regression, RMSE highlights large misses, while MAE stays stable under outliers. R² summarizes variance captured relative to a mean-only baseline, and adjusted R² penalizes adding weak predictors. MAPE communicates relative error percentages, and MSLE emphasizes proportional misses when targets are nonnegative. Compare metrics on the same scale and on the same validation split to avoid misleading improvements.
Residuals should look random when the model matches the data-generating pattern. A funnel shape suggests changing variance, often improved by transformations or weighted loss. Repeating waves can signal missing seasonality or nonlinearity. The Durbin–Watson statistic flags serial correlation in ordered data; values far from two indicate that independent-error assumptions are not satisfied.
AIC and BIC combine fit with complexity by using the log-likelihood and parameter count. Lower values favor better tradeoffs when comparing models trained on the same dataset. BIC usually penalizes complexity more strongly than AIC, so it often selects simpler models. Use AICc when sample size is small relative to parameters, because it corrects optimistic scoring.
Binary classification converts probabilities into labels using a threshold. Raising the threshold typically increases specificity but reduces recall; lowering it does the opposite. F1 balances precision and recall, balanced accuracy averages sensitivity and specificity, and MCC remains informative when classes are imbalanced. Evaluate the confusion matrix with business costs, not accuracy alone.
ROC AUC measures ranking quality: higher values mean positives receive larger scores than negatives. It does not guarantee well-calibrated probabilities. Log loss rewards confident, correct probabilities and heavily penalizes confident mistakes, while the Brier score tracks mean squared probability error. Calibration plots help you spot overconfidence, underconfidence, and segments that need recalibration. When classes are rare, inspect precision and recall across thresholds too.
Model fit reporting is strongest when metrics, inputs, and assumptions are reproducible. Exporting the metric table supports peer review, and exporting row-level errors helps debugging and drift tracking. Include the parameter count used for AIC/BIC, your threshold choice, and the evaluation window. Regularly compare current results to baselines to catch silent degradation.
Enter aligned lists of actual values and predicted values. Provide the predictor count k, and optionally include the intercept for parameter totals used in AIC and BIC. The calculator then reports error, fit, and diagnostic metrics.
Adjusted R² reduces the score when additional predictors do not meaningfully improve fit. It helps compare models with different feature counts on the same dataset, discouraging overfitting driven by unnecessary variables.
A Gaussian log-likelihood is estimated from SSE and sample size, then AIC = 2p − 2logL and BIC = ln(n)p − 2logL. Use them to compare models evaluated on identical data with the same target.
AUC is computed from ranked predicted probabilities using a tie-aware rank-sum method. This estimates the probability that a random positive receives a higher score than a random negative. It is undefined if one class is missing.
The calculator clips probabilities internally for log loss stability, preventing log(0). Keep your input within 0–1, and consider calibration if predictions are frequently extreme. Extreme values can exaggerate log loss when wrong.
Yes. Download a metrics CSV for summaries, a data CSV for row-level review, and a PDF report for sharing. Exports use the most recent calculation stored in your session, so re-run the calculator if inputs change.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.