Example data table
| x | y | Context |
|---|---|---|
| 1 | 2.10 | Low input with a small response. |
| 2 | 2.90 | Near-linear growth begins. |
| 3 | 3.70 | Trend continues steadily. |
| 4 | 4.20 | Mild deviation from the line. |
| 5 | 5.10 | Another point along the pattern. |
| 6 | 6.00 | Higher input with expected response. |
You can paste only the first two columns into the dataset box.
Formula used
This calculator runs k-fold cross validation for a simple linear model: ŷ = b₀ + b₁x. For each fold, it fits the model on the training rows and evaluates predictions on the held-out rows.
b₀ = ȳ − b₁ x̄
MSE = (1/n) Σ(y − ŷ)²
RMSE = √MSE
R² = 1 − Σ(y − ŷ)² / Σ(y − ȳ)²
How to use this calculator
- Paste your dataset as two columns: x then y.
- Choose k folds and a metric to evaluate.
- Enable shuffling for less order bias.
- Press Submit to compute fold models and scores.
- Review the fold table and prediction table above.
- Use Download CSV or PDF for reporting.
Why cross validation reduces selection bias
Cross validation estimates how a model will generalize by repeating training and testing on different partitions of the same dataset. Instead of trusting one split, k-fold validation rotates the holdout set, so every row becomes test data once. This rotation stabilizes performance estimates when samples are limited and helps prevent overfitting to a lucky split.
Choosing k with practical tradeoffs
Smaller k, such as 5, runs faster and produces test folds with more rows, which can lower variance in the fold score. Larger k, such as 10, uses more training data per fold, often improving the model fit, but increases compute and may raise variability if each test fold becomes tiny. For very small datasets, leave-one-out can be informative but noisy.
Interpreting MAE, MSE, RMSE, and R²
MAE reports typical absolute error in the original units, making it intuitive for stakeholders. MSE and RMSE penalize larger mistakes more heavily; RMSE returns to original units and is widely compared across experiments. R² explains variance captured by the model, but it can be undefined when the test fold has near-zero variance in y. Compare metrics consistently across folds.
Reading fold coefficients and residual patterns
Each fold produces an intercept and slope from least squares fitting. Large swings in slope across folds can indicate sensitivity to outliers, leverage points, or nonlinearity in the relationship. Residuals near zero suggest a good fit on that fold, while systematic positive or negative residuals imply bias. Use the prediction table to spot regions where errors cluster.
Using exports for decisions and documentation
Exporting fold tables to CSV supports quick plotting, filtering, and aggregation in spreadsheets or notebooks. The PDF summary is useful for reviews, audit trails, and handoffs. Record k, the metric, and the random seed so results remain reproducible. When comparing models, keep identical folds and metrics to ensure fair, apples-to-apples evaluation. For datasets with noise, report both average and variability; a low mean error with high spread signals instability. If you suspect outliers, review residual extremes and consider robust preprocessing. Cross validation is descriptive, not causal; pair it with domain checks before deployment.
FAQs
1) What does this calculator validate?
It validates a simple linear predictor, fitting y = b₀ + b₁x on training folds and scoring predictions on the held-out fold using your selected metric.
2) Should I enable shuffling?
Yes, if your rows are ordered by time, category, or magnitude. Shuffling reduces order bias and makes folds more representative of the overall dataset.
3) Why can R² show N/A?
R² depends on variance in the test fold’s y values. If the fold’s y values are nearly constant, the denominator becomes zero and R² is undefined.
4) How many rows do I need?
More is better, but you can start with a small dataset. Ensure each fold has enough training rows to fit the line and enough test rows to score reliably.
5) What k value should I use?
Common choices are 5 or 10. Use 5 for speed and sturdier test folds, and 10 when you want slightly more training data per fold.
6) How do I compare two models fairly?
Use the same dataset, k, metric, shuffle option, and seed for both runs. Then compare mean scores and the spread across folds.