Build regression models from your dataset quickly. Track chosen variables, coefficients, errors, and fit gains. Use clear outputs for smarter predictor selection decisions today.
Paste a numeric CSV dataset. The first row must contain headers. Set the target column, choose a selection rule, and run the stepwise search.
This sample dataset is preloaded in the form. Use it to test variable entry order, coefficient estimates, and prediction accuracy.
| Y | X1 | X2 | X3 | X4 | X5 |
|---|---|---|---|---|---|
| 24 | 2 | 5 | 1 | 8 | 3 |
| 31 | 3 | 6 | 2 | 9 | 4 |
| 35 | 4 | 5 | 3 | 11 | 5 |
| 43 | 5 | 7 | 4 | 12 | 4 |
| 46 | 6 | 8 | 4 | 13 | 5 |
| 54 | 7 | 8 | 5 | 15 | 6 |
| 57 | 8 | 9 | 6 | 14 | 7 |
| 66 | 9 | 10 | 7 | 16 | 6 |
| 70 | 10 | 11 | 7 | 18 | 7 |
| 77 | 11 | 12 | 8 | 19 | 8 |
| 80 | 12 | 12 | 9 | 20 | 9 |
| 89 | 13 | 13 | 10 | 22 | 8 |
Forward selection begins with an intercept-only model and tests each unused predictor one at a time. At every step, the page adds the variable that gives the best improvement for the chosen criterion.
β = (X'X)-1X'Yŷ = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚe = y - ŷ1 - SSE / SST1 - (1 - R²)(n - 1)/(n - p - 1)sqrt(SSE / n)n ln(SSE / n) + 2kn ln(SSE / n) + ln(n)kHere, n is the number of rows, p is the count of selected predictors, and k is the number of estimated coefficients including the intercept.
It starts with an intercept-only model, then adds one predictor at a time. Each new step keeps the variable that most improves the chosen fit criterion, helping you build a smaller regression model without testing every possible combination manually.
Use Adjusted R² when you want higher explained variance with a size penalty. Use AIC or BIC when you prefer penalized model comparison. Use RMSE when predictive error on the current sample matters most.
Yes. The first row must contain unique column names. The target variable field should match one of those headers, and every remaining numeric column becomes a possible predictor during the forward search.
Perfect or near-perfect collinearity can make the matrix inversion unstable. When that happens, the page skips singular candidate models and keeps only models that can be estimated reliably with the available numeric precision.
This file uses a lightweight normal-tail approximation for coefficient significance. It is practical for quick analysis, but dedicated statistical software can provide more exact small-sample inference using full t-distribution calculations.
Yes for moderate pasted datasets, but very large tables may feel heavy in a single browser page. For large production workloads, importing data from files or connecting to a database would be a better approach.
The selection path graph tracks model quality across forward steps. The actual-versus-predicted graph compares fitted values against observed values, helping you judge how closely the final model follows the target column.
Stop when the next variable barely improves the chosen criterion, when interpretability matters more than extra complexity, or when domain knowledge suggests the current predictor set already explains the outcome well enough.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.