Regression Performance Score Calculator

Calculator

Compute regression metrics and convert them into a single score.

Tip

Tune the scale and weights to match your KPIs.

Input method

Pairs mode computes everything from raw numbers.

Delimiter

Use the same delimiter on every line.

First row is header

Skips the first line in pairs mode.

Actual and predicted pairs

One pair per line. Example: 12.4, 13.1

Scoring settings

These settings control how metrics convert into 0–100 subscores.

Error scale (acceptable max)

Used for MAE, RMSE, MSE, max error, and median |error|.

MAPE cap (%)

MAPE/sMAPE at this value maps to a zero score.

R² floor

R² at the floor maps to a zero score.

Weights

Set weights to emphasize what matters for your deployment.

MAE weight

RMSE weight

R² weight

MAPE weight

sMAPE weight

Explained variance weight

Max error weight

Median |error| weight

Weighting tip

Weights are auto-normalized. A weight of 0 excludes the metric from the composite score.

Example data table

A small sample of actual vs predicted values used in regression evaluation.

#	Actual (y)	Predicted (ŷ)	Error (ŷ − y)	\|Error\|
1	10.00	9.50	-0.50	0.50
2	12.00	12.60	0.60	0.60
3	13.00	12.20	-0.80	0.80
4	15.00	15.40	0.40	0.40
5	18.00	17.10	-0.90	0.90
6	20.00	20.90	0.90	0.90

Formulas used

MAE = (1/n) · Σ |ŷᵢ − yᵢ|
MSE = (1/n) · Σ (ŷᵢ − yᵢ)²
RMSE = √MSE
R² = 1 − SS_res/SS_tot

MAPE = (100/m) · Σ |(ŷᵢ − yᵢ)/yᵢ| (skips y≈0)
sMAPE = (100/n) · Σ 2|ŷᵢ − yᵢ|/(|yᵢ|+|ŷᵢ|)
Explained variance = 1 − Var(y − ŷ)/Var(y)
Composite score = Σ (wⱼ/Σw) · scoreⱼ

How metric subscores are computed

Lower-is-better metrics map using score = 100 × (1 − min(value, cap)/cap). Higher-is-better metrics map to 0–100 using an R² floor and a 0–1 clamp.

How to use this calculator

Choose pairs to paste actual and predicted values, or select manual to enter metrics you already computed.
Set an error scale that represents the largest acceptable error in your unit (e.g., dollars, kWh, minutes).
Adjust weights to match priorities: stability (R²), absolute accuracy (MAE), or penalty for large misses (max error).
Press Calculate score. The report appears above the form, ready to export.
Use Download CSV for dashboards, or Download PDF for sharing and reviews.

Regression score as a release gate

Many data science teams compare dozens of candidates per sprint. A single composite score helps triage quickly before deeper residual analysis. In practice, teams set a minimum score threshold for promotion to staging, then require stable scores across validation splits and critical customer segments. For example, if two models differ by only 0.1 RMSE, the score highlights whether that change matters under your chosen tolerance. This encourages practical evaluation, not metric chasing during iteration and stakeholder sign-off cycles.

Choosing an error scale that matches the unit

The error scale is the main normalizer for lower-is-better metrics such as MAE, RMSE, median absolute error, and max error. Treat it as the largest acceptable typical error in your measurement unit. For prices, it might be 5 dollars; for energy, 0.5 kWh; for latency, 20 ms. When the scale is too large, weak models look acceptable. When it is too small, most models collapse toward low subscores and become hard to distinguish.

Interpreting percentage errors responsibly

MAPE and sMAPE provide unit-free views, but they behave differently near zero. MAPE skips tiny actuals to avoid exploding percentages, while sMAPE bounds the denominator with |y|+|ŷ|. Use a realistic percentage cap, such as 10–30%, for the score mapping. If your data includes many small targets, rely more on MAE or RMSE and reduce MAPE weight.

Balancing fit, variance, and bias signals

R² and explained variance reward capturing variation rather than predicting the mean. However, they do not guarantee small errors. A model can show strong R² yet still have unacceptable RMSE when the target scale is large. Also watch mean bias error: persistent positive bias suggests systematic overprediction, and negative bias suggests underprediction. If bias matters operationally, keep a nonzero weight on a scale-based error metric.

Using exports for audits and monitoring

Use CSV for experiment logs, dashboards, and alerts, and PDF for reviews, approvals, and documentation. Save the exact settings alongside each run: delimiter choice, header flag, error scale, percentage cap, R² floor, and all weights. Consistent settings make month-to-month score comparisons valid and support transparent model governance.

FAQs

1) What does the composite score represent?

It is a 0–100 weighted blend of metric subscores. Lower error and higher fit increase the score, based on your selected caps, floor, and weights.

2) Which metrics should I prioritize for business decisions?

Prioritize MAE for typical error, RMSE for large-miss risk, and R² for trend capture. Add MAPE or sMAPE when stakeholders need percentage interpretation.

3) How do I choose an appropriate error scale?

Set it near the largest acceptable typical error in your unit. Use domain tolerance, historical baselines, or SLA limits, then keep it consistent across model comparisons.

4) Why can R² be high while the score is moderate?

R² measures relative variance explained, not absolute error size. If the target scale is large, RMSE or MAE can remain high, lowering subscores and the composite result.

5) When should I rely on sMAPE instead of MAPE?

Use sMAPE when targets can be near zero or change sign. Its denominator uses |y|+|ŷ|, reducing extreme percentages that can dominate MAPE.

6) Are the exports suitable for model governance?

Yes. CSV supports traceable logs, and PDF supports review packs. Include settings, weights, and timestamps so results can be reproduced during audits and retraining.