Adjusted Fit Score Calculator

Calculator

Input mode

Choose how the baseline fit is provided.

R² (training)

Usually within 0 to 1 (can be negative).

Observations (n)

Sample size used to fit the model.

Predictors (p)

Number of features in the model.

Validation R² (optional)

Cross‑validation or held‑out estimate.

Generalization penalty (beta)

Higher values penalize train‑to‑validation gaps more.

Complexity penalty (gamma)

Higher values penalize large p relative to n.

Blend weight (0 to 1)

1 emphasizes generalization; 0 emphasizes simplicity.

Floor negative scores at zero

Helps interpret the score as a 0–100 rating.

Reset

Example data

Scenario	Inputs	Outputs
Balanced model	R² = 0.82, Validation R² = 0.76 n = 250, p = 12, beta = 2, gamma = 5, blend = 0.6	Adjusted R² ≈ 0.811 Score shrinks for gap and complexity.
Overfit risk	R² = 0.92, Validation R² = 0.70 n = 180, p = 40, beta = 3, gamma = 6, blend = 0.7	Strong gap penalty and complexity penalty. Score drops despite high training fit.
Lean model	SSE = 1200, SST = 6500 n = 300, p = 6, beta = 2, gamma = 4, blend = 0.5	R² computed from errors. Simplicity keeps penalties mild.

Formula used

Baseline fit

If you enter SSE and SST, the calculator computes: R² = 1 − SSE / SST

Adjusted R²

When n > p + 1: Adjusted R² = 1 − (1 − R²) · (n − 1) / (n − p − 1)

Penalties

Complexity penalty shrinks as p/n rises: C = 1 / (1 + gamma · (p / n))
Generalization penalty uses the positive gap between training and validation: gap = max(0, R² − R²_val), G = 1 / (1 + beta · gap)

Adjusted Fit Score

The score blends penalties using weight w (0–1): Score = Baseline · G^w · C^(1 − w)
It is reported on a 0–100 scale after clamping.

How to use

Choose an input mode: enter R², or compute it from SSE and SST.
Enter n (observations) and p (predictors).
Optionally enter validation R² to penalize overfitting risk.
Tune beta and gamma to match your penalty preference.
Pick blend weight to balance generalization versus simplicity.
Press Submit to see results above the form.
Use the download buttons to export CSV or PDF.

Fit beyond training R2

Training R2 can look impressive while real performance lags. The adjusted fit score combines an adjusted R2 baseline with penalties that reflect generalization and model size. When n is close to p, adjusted R2 often drops sharply, warning that the apparent fit may be driven by degrees of freedom rather than signal. Using a 0–100 scale makes comparisons easier across experiments, feature sets, and time windows.

Generalization gap signal

If you provide validation R2, the calculator measures the positive gap max(0, R2 − R2_val). A gap of 0.10 with beta 2 yields a generalization penalty of 1/(1+0.2)=0.833, reducing the score even when training fit stays high. This encourages selecting models that keep training and validation aligned, which is especially important under dataset shift, leakage risk, or aggressive feature engineering.

Complexity control with p/n

The complexity penalty uses p/n to represent how crowded the feature space is relative to data. With gamma 5, p=25 and n=250 gives C=1/(1+0.5)=0.667, while p=10 and n=250 gives C=1/(1+0.2)=0.833. This simple ratio approximates the intuition that larger models require more data to maintain stable estimates and avoid brittle coefficients.

Tuning blend, beta, gamma

Blend weight w sets the emphasis between generalization and simplicity. When w=0.7, the score reacts more to validation gaps; when w=0.3, it reacts more to p/n. Beta and gamma should match your risk tolerance: raise beta for production models where surprises are costly, and raise gamma for interpretable models where feature parsimony matters. Keep defaults consistent to track improvements fairly.

Reading the score in practice

Use the grade bands as a quick screen, then inspect components. Two models can share the same score for different reasons: one may have strong adjusted R2 but a large gap, while another may generalize well but be overly complex. Export results to document experiments, attach them to model cards, and compare runs alongside MAE, RMSE, or business KPIs for a complete view. Track the score over releases to detect silent regressions early.

FAQs

1) What does the adjusted fit score measure?

It rates model fit after accounting for sample size, feature count, and validation consistency, producing a 0–100 number that is easier to compare across experiments than raw R2 alone.

2) When should I enter validation R2?

Use it whenever you have cross‑validation or a holdout set. The score penalizes only positive train‑to‑validation gaps, helping you spot overfitting and select models that generalize.

3) Why is adjusted R2 shown as N/A sometimes?

Adjusted R2 requires n greater than p plus one. If the condition fails, the calculator falls back to training R2 and notes that the adjusted statistic is undefined for that configuration.

4) How do beta and gamma change the result?

Beta strengthens the generalization penalty from validation gaps, while gamma strengthens the complexity penalty from p/n. Keep them consistent within a project to make scores comparable and interpret changes confidently.

5) Can I use this for classification models?

Yes, if you supply an R2‑like metric from a regression‑style evaluation, such as explained variance on probabilities. For pure classification, consider pairing this score with AUC, log loss, or calibration error.

6) What happens if R2 is negative?

Negative R2 means the model is worse than predicting the mean. If flooring is enabled, the score is clamped to zero, making the rating easier to interpret while still showing the underlying R2 value.