Model Fit Calculator

Calculator

Model type

Choose the metric set that matches your task.

Predictors (k)

Used for adjusted R², AIC/BIC, and F-statistic.

Include intercept in parameters

Yes, add 1 parameter for intercept

Affects information criteria only.

Actual values (y)

Paste numbers separated by commas, spaces, or new lines.

Predicted values (ŷ)

Use the same number of items as actual values.

Reset

Results appear above this form after you submit.

Example Data Table

Use this small set to validate your input formatting.

Regression (y vs ŷ)

#	Actual (y)	Predicted (ŷ)
1	10	9.8
2	12	11.5
3	9	10.2
4	15	14.4
5	13	13.1

Binary Classification (y vs p)

#	Actual (0/1)	Probability (p)
1	1	0.81
2	0	0.22
3	1	0.73
4	1	0.61
5	0	0.35

Formula Used

Regression

eᵢ = yᵢ − ŷᵢ (residual)
SSE = Σ eᵢ²
SST = Σ (yᵢ − ȳ)²
R² = 1 − SSE/SST
Adj R² = 1 − (1−R²)(n−1)/(n−k−1)
RMSE = √(SSE/n), MAE = mean(|eᵢ|)
MAPE = mean(|eᵢ/yᵢ|)×100 (skips yᵢ = 0)
AIC = 2p − 2logL, BIC = ln(n)p − 2logL

Binary Classification

ŷ = 1 if p ≥ threshold, else 0
Accuracy = (TP+TN)/n
Precision = TP/(TP+FP)
Recall = TP/(TP+FN)
F1 = 2PR/(P+R)
Log loss = −mean(y ln p + (1−y) ln(1−p))
AUC from ranked probabilities (tie-aware)
logL = −n × log loss, then AIC/BIC

Note: Information criteria depend on your chosen parameter count.

How to Use This Calculator

Select Regression for numeric predictions, or Binary Classification for 0/1 labels with probabilities.
Paste your lists using commas, spaces, or line breaks. Keep both lists aligned and equal length.
Set k predictors for regression, or p parameters for classification, for AIC/BIC.
Click Calculate Model Fit. The results box appears above the form.
Use the export buttons to save metrics, data, or a PDF summary.

Fit metrics reveal signal versus noise

Good fit starts with error size and explained variance. For regression, RMSE highlights large misses, while MAE stays stable under outliers. R² summarizes variance captured relative to a mean-only baseline, and adjusted R² penalizes adding weak predictors. MAPE communicates relative error percentages, and MSLE emphasizes proportional misses when targets are nonnegative. Compare metrics on the same scale and on the same validation split to avoid misleading improvements.

Why residual structure matters

Residuals should look random when the model matches the data-generating pattern. A funnel shape suggests changing variance, often improved by transformations or weighted loss. Repeating waves can signal missing seasonality or nonlinearity. The Durbin–Watson statistic flags serial correlation in ordered data; values far from two indicate that independent-error assumptions are not satisfied.

Interpreting information criteria for comparison

AIC and BIC combine fit with complexity by using the log-likelihood and parameter count. Lower values favor better tradeoffs when comparing models trained on the same dataset. BIC usually penalizes complexity more strongly than AIC, so it often selects simpler models. Use AICc when sample size is small relative to parameters, because it corrects optimistic scoring.

Threshold choices change classification outcomes

Binary classification converts probabilities into labels using a threshold. Raising the threshold typically increases specificity but reduces recall; lowering it does the opposite. F1 balances precision and recall, balanced accuracy averages sensitivity and specificity, and MCC remains informative when classes are imbalanced. Evaluate the confusion matrix with business costs, not accuracy alone.

Calibration complements discrimination

ROC AUC measures ranking quality: higher values mean positives receive larger scores than negatives. It does not guarantee well-calibrated probabilities. Log loss rewards confident, correct probabilities and heavily penalizes confident mistakes, while the Brier score tracks mean squared probability error. Calibration plots help you spot overconfidence, underconfidence, and segments that need recalibration. When classes are rare, inspect precision and recall across thresholds too.

Reporting and governance with exports

Model fit reporting is strongest when metrics, inputs, and assumptions are reproducible. Exporting the metric table supports peer review, and exporting row-level errors helps debugging and drift tracking. Include the parameter count used for AIC/BIC, your threshold choice, and the evaluation window. Regularly compare current results to baselines to catch silent degradation.

FAQs

What inputs are required for regression mode?

Enter aligned lists of actual values and predicted values. Provide the predictor count k, and optionally include the intercept for parameter totals used in AIC and BIC. The calculator then reports error, fit, and diagnostic metrics.

What does adjusted R² tell me beyond R²?

Adjusted R² reduces the score when additional predictors do not meaningfully improve fit. It helps compare models with different feature counts on the same dataset, discouraging overfitting driven by unnecessary variables.

How are AIC and BIC calculated here?

A Gaussian log-likelihood is estimated from SSE and sample size, then AIC = 2p − 2logL and BIC = ln(n)p − 2logL. Use them to compare models evaluated on identical data with the same target.

How is ROC AUC computed in this tool?

AUC is computed from ranked predicted probabilities using a tie-aware rank-sum method. This estimates the probability that a random positive receives a higher score than a random negative. It is undefined if one class is missing.

What if my probabilities contain 0 or 1?

The calculator clips probabilities internally for log loss stability, preventing log(0). Keep your input within 0–1, and consider calibration if predictions are frequently extreme. Extreme values can exaggerate log loss when wrong.

Can I export results for reporting?

Yes. Download a metrics CSV for summaries, a data CSV for row-level review, and a PDF report for sharing. Exports use the most recent calculation stored in your session, so re-run the calculator if inputs change.