Leave-One-Out Error Calculator

Calculator inputs

Dataset (delimited text)

Provide numeric columns. Use at least one feature column and one target column.

Delimiter

Header row

Target column (1-based)

Use 0 to select the last column as the target.

Model

Polynomial degree

Used only for polynomial regression.

LOOCV method

Fast method applies to linear-in-parameters regression.

Include intercept

Example data table

This sample has two predictors (x1, x2) and a target (y). You can paste it into the dataset box.

x1	x2	y
1	2	10
2	1	9
3	4	19
4	3	18
5	5	25
6	4	24
7	6	31
8	7	35

Try: Model = Linear regression, Target column = 3, and Fast analytic method.

Formula used

Leave-one-out cross-validation (LOOCV) evaluates a model by training on n−1 rows and predicting the remaining row, repeating for every row. For each observation i, the LOOCV error is eᵢ = yᵢ − ŷᵢ(−i).

LOO MSE: MSE = (1/n) Σ (yᵢ − ŷᵢ(−i))²
LOO RMSE: RMSE = √MSE
LOO MAE: MAE = (1/n) Σ |yᵢ − ŷᵢ(−i)|

For least squares regression (and polynomial regression, which is linear in parameters), the fast analytic method uses the hat matrix diagonal hᵢᵢ. With residual eᵢ = yᵢ − ŷᵢ, the LOOCV residual becomes eᵢ/(1−hᵢᵢ), so: ŷᵢ(−i) = yᵢ − eᵢ/(1−hᵢᵢ).

How to use this calculator

Paste your dataset into the text box (one row per line).
Select the delimiter and whether the first row is a header.
Set the target column number (use 0 for last column).
Choose a model: linear regression, polynomial regression, or mean baseline.
Pick a LOOCV method. Use fast analytic for speed; exact refit for verification.
Press Submit to view metrics and per-row errors above the form.
Use the download buttons to export results as CSV or PDF.

Why leave-one-out matters

Leave-one-out cross-validation (LOOCV) estimates generalization error when data is limited. With n observations, it trains n models (or one analytic equivalent) and tests each left-out point. That makes it useful for pilot experiments where holding out 20% would waste signal.

Interpreting LOO metrics

The table reports each y, the LOOCV prediction ŷ(−i), and the error y−ŷ(−i). MSE emphasizes large misses, RMSE keeps units, and MAE is robust to occasional spikes. MAPE is easy to read, but it can explode when y is near zero; sMAPE stabilizes that by scaling with |y|+|ŷ|. LOOCV R² can be negative if predictions are worse than using the mean. MAE=0.8 means predictions miss by about 0.8 units.

Fast versus exact computation

For least squares, LOOCV can be computed from one fit using leverage h_ii. The analytic residual is e_i/(1−h_ii), so extreme leverage can inflate error dramatically. Use Exact refit to double-check when h_ii approaches 1 or when the design is nearly singular. Analytic LOOCV is typically faster once the initial fit is done.

Diagnosing influential points

High leverage and large residuals together indicate influential observations. A common rule flags leverage above about 2k/n, where k is the number of fitted coefficients. If a row shows h_ii=0.62 and |error| is four times larger than the median, investigate measurement issues, transcription errors, or nonlinearity. Residual plots should look patternless; curvature suggests missing terms. Compare the worst 5% errors to domain tolerances.

Choosing model complexity

Polynomial degree increases flexibility but also variance. A degree-5 curve may fit training data tightly while LOOCV RMSE worsens. Prefer the simplest model that keeps LOOCV RMSE stable across reasonable settings. If adding a predictor reduces RMSE by only 1–2%, it may not justify extra complexity. Watch for collinearity: correlated predictors can raise leverage and numerical sensitivity.

Reporting results responsibly

Report n, chosen model, and the LOOCV metric you optimized. Include a short distribution summary: median |error| and the 90th percentile. When comparing models, use the same target scaling and preprocessing. Small LOOCV differences (e.g., RMSE 2.10 vs 2.05) are often practically negligible. Pair the metrics with the plots: actual-versus-predicted should cluster near the diagonal, and residuals should center near zero.

FAQs

1) What is leave-one-out error?

It is the prediction error computed when each row is left out once, the model is trained on the remaining rows, and the left-out value is predicted. Errors are then summarized with metrics like MSE, RMSE, or MAE.

2) When should I use the fast analytic method?

Use it for ordinary least squares (including polynomial regression in parameters) when the matrix inversion is stable. It is much faster than refitting n times. If leverage values are extreme or results look unstable, use exact refit.

3) Why can LOOCV R² be negative?

LOOCV R² compares LOOCV SSE to the variance of y around its mean. If cross-validated predictions are worse than predicting the mean for each left-out point, the ratio exceeds 1 and R² becomes negative.

4) What does leverage h_ii tell me?

Leverage measures how unusual a row’s predictor values are relative to the rest. High leverage points can strongly change fitted coefficients. When high leverage also has a large residual, it can dominate error and deserves investigation.

5) Can I use this calculator for classification?

This tool is designed for numeric targets and regression-style errors. For classification, you typically use metrics like accuracy, log loss, or AUC and different validation code. You can still approximate by coding classes numerically, but interpret carefully.

6) How should I format my dataset?

Paste delimited rows with numeric columns. Select the correct delimiter, indicate whether the first row is a header, and set the target column number. Missing or non-numeric rows are skipped, so clean your data for best results.