Enter Fold Performance Data
Use the controls below to calculate mean cross validation error, spread, confidence interval, and model stability indicators.
Example Data Table
This example shows a five-fold workflow using mean squared error values from a supervised learning model.
| Fold | Training Error | Validation Samples | Validation Error | Generalization Gap |
|---|---|---|---|---|
| 1 | 0.1210 | 120 | 0.1640 | 0.0430 |
| 2 | 0.1180 | 115 | 0.1580 | 0.0400 |
| 3 | 0.1250 | 118 | 0.1710 | 0.0460 |
| 4 | 0.1190 | 122 | 0.1600 | 0.0410 |
| 5 | 0.1230 | 119 | 0.1660 | 0.0430 |
Formula Used
Simple mean cross validation error: CV Error = (e1 + e2 + ... + ek) / k
Weighted cross validation error: CV Error = Σ(wiei) / Σwi
Sample standard deviation: s = √[Σ(ei − ē)2 / (k − 1)]
Confidence interval: CV Error ± z × (s / √k)
The calculator treats each fold validation score as an estimate of out-of-sample loss. The mean summarizes expected generalization error, while the spread shows how sensitive performance is to fold composition. Weighted aggregation is useful when validation fold sizes differ.
When training errors are added, the calculator also estimates the average generalization gap. A larger gap may indicate overfitting, data leakage risk, or unstable preprocessing between folds.
How to Use This Calculator
- Enter your model name, dataset size, fold count, validation strategy, loss metric, confidence level, and preferred decimal precision.
- For each fold, enter the validation sample count and validation error. Add training error values when you want a generalization gap estimate.
- Choose weighted aggregation if fold sizes differ. Use simple mean when all folds are equally sized or intentionally balanced.
- Click Calculate Error to show results above the form. Review the mean, interval, dispersion, and fold detail table.
- Use the CSV or PDF buttons to export the current summary for reporting, peer review, or experiment documentation.
Frequently Asked Questions
1. What does cross validation error measure?
It estimates how well a model performs on unseen data by averaging validation loss across multiple train-test splits. Lower values generally indicate better expected generalization.
2. Why might weighted and simple averages differ?
They differ when folds contain unequal validation sample counts. Weighted averaging gives larger folds more influence, which can better reflect the full dataset’s overall error.
3. Should I use RMSE, MAE, or log loss?
Use the loss that matches your modeling objective. RMSE emphasizes larger misses, MAE is more robust to outliers, and log loss suits probabilistic classification.
4. What does the confidence interval tell me?
It gives a rough uncertainty band around the average validation error. A narrower interval suggests more stable fold-to-fold performance.
5. Why include training error values?
Training error lets you compare in-sample and validation performance. A consistently large gap can indicate overfitting or weak generalization control.
6. Can this help compare two models?
Yes. Run the calculator separately for each model using the same folds and metric. Compare average error, variability, and confidence interval overlap.
7. What if one fold performs much worse?
Check the range, quartiles, and generalization gap. One weak fold may reveal class imbalance, temporal drift, leakage prevention issues, or noisy validation data.
8. Does a lower error always mean the best model?
Not always. You should also consider variance, interpretability, inference cost, fairness, calibration, and business constraints before selecting a final model.