Calculator
Example Data Table
| Scenario | Inputs | Outputs |
|---|---|---|
| Regression metrics | SSE=120, SST=400, n=50, p=3 | R²=0.7000, Adj R²=0.6804, Explained=70.00% |
| Observed vs predicted | y: 10 12 9 15 14 18 ŷ: 11 11 10 14 13 17 |
R²≈0.8929, MAE≈1.0000, RMSE≈1.0000 |
| Component analysis | Eigenvalues: 4.2 1.8 1.0 0.5 Target=90% |
C1=56.00%, C2=24.00%, C3=13.33% Components for target: 3 |
Formula Used
- Total Sum of Squares (SST): SST = Σ(y − ȳ)²
- Sum of Squared Errors (SSE): SSE = Σ(y − ŷ)²
- Variance explained (R²): R² = 1 − (SSE / SST)
- Explained percent: 100 × R²
- Adjusted R² (optional): 1 − (1 − R²) × (n − 1) / (n − p − 1)
- PCA explained variance ratio: EVRᵢ = λᵢ / Σλ, cumulative = ΣEVR
How to Use This Calculator
- Select a mode based on your available information.
- Enter your inputs using the provided format hints.
- For paired data, keep observed and predicted lengths equal.
- Optionally provide n and p to get adjusted R² and F.
- Click Calculate to show results above the form.
- Use CSV or PDF downloads to save your output.
Connecting SSE and SST to explained variance
Variance explained comes from comparing error dispersion to total dispersion. Using SST = Σ(y − ȳ)² and SSE = Σ(y − ŷ)², the calculator reports R² = 1 − SSE/SST. For example, if SST=400 and SSE=120, R²=0.70 and explained variance is 70%. A smaller SSE or larger SST raises the explained share, but SST must be positive for stable interpretation. If SSE exceeds SST, R² becomes negative, signaling performance worse than predicting the mean. Validate on holdout data; an R² of 0.75 can drop to 0.55 when overfit. Check residual patterns for systematic bias.
Interpreting R² and explained percent in context
R² is descriptive, not a guarantee of usefulness. In noisy behavioral data, 0.30 can be meaningful; in controlled engineering tests, teams often expect 0.80 or higher. Compare explained percent to baseline variability, measurement error, and the cost of mistakes. Report both R² and an error metric so stakeholders see variance captured and typical deviation.
Adjusted R², F statistic, and model complexity
Adding predictors can inflate R² even when they add little signal. Adjusted R² penalizes complexity using n and p: 1 − (1 − R²)×(n−1)/(n−p−1). With n=50, p=3, and R²=0.70, adjusted R² is about 0.68. The overall F statistic compares explained and unexplained mean squares; a larger F suggests the model improves beyond random noise.
Paired observed–predicted diagnostics
When you paste observed and predicted lists, the tool computes R² from the same sums of squares and adds MAE, RMSE, and optional MAPE. In the sample pair, MAE≈1 indicates typical absolute error near one unit. Use RMSE to emphasize larger misses, and use MAPE only when zeros are rare and the scale is meaningful.
Component analysis and cumulative targets
For PCA-style inputs, eigenvalues represent variance captured by each component. The explained variance ratio is λᵢ/Σλ, and cumulative percent shows how quickly information accumulates. If eigenvalues are 4.2, 1.8, 1.0, 0.5, the first component explains 56% and the first three reach about 93.3%. A target like 90% helps choose the smallest component count that meets your fidelity goal.
FAQs
What does “variance explained” mean here?
It is the share of total variability in the outcome captured by the model. The tool reports it as R² and as a percentage, based on SSE relative to SST.
Can R² be negative?
Yes. If SSE is larger than SST, the model fits worse than predicting the mean. This often signals poor specification, data leakage during training, or evaluation on different data.
Why include adjusted R²?
Adjusted R² accounts for sample size and number of predictors. It helps compare models with different p values by penalizing unnecessary complexity, especially when n is modest.
Which error metric should I report with R²?
Use MAE for typical absolute error, and RMSE when large misses matter more. MAPE is useful for positive, non‑zero targets on a ratio scale, but it can be unstable near zero.
How is PCA explained variance computed?
Each eigenvalue represents variance captured by a component. The tool divides each eigenvalue by the sum of all eigenvalues to get explained percent, then accumulates those percents for the cumulative curve.
How many components should I keep?
Choose the smallest count that reaches your target cumulative percent, such as 90% or 95%. Combine this with interpretability checks and validation on downstream performance, not variance alone.