Nested Cross Validation Calculator

Analyze outer folds, inner searches, and stability clearly. Measure variance, confidence bounds, and training optimism. Choose better models using fairer validation evidence across experiments.

Calculator Inputs

Enter one score per outer fold for training, inner best tuning, and outer validation. Keep fold counts aligned with the outer fold setting.

Use 1.96 for an approximate 95% interval.
Comma separated. Leave aligned with outer folds.
One numeric score per outer fold.
Best inner search result for each outer loop.
Training score for the selected model in each fold.
Enter one chosen setting per line, in outer fold order.

Example Data Table

Fold Fold Size Training Score Inner Best Score Outer Validation Score Selected Hyperparameters
12000.90800.88200.8420max_depth=6, eta=0.05
22000.91400.89100.8570max_depth=5, eta=0.05
32000.91100.88600.8510max_depth=6, eta=0.04
42000.90600.87900.8390max_depth=5, eta=0.06
52000.91800.89400.8640max_depth=6, eta=0.05

Formula Used

Outer nested estimate
\( \hat{\theta}_{NCV} = \frac{1}{K} \sum_{i=1}^{K} s_i \)
Weighted outer estimate
\( \hat{\theta}_{weighted} = \frac{\sum n_i s_i}{\sum n_i} \)
Sample variance and standard deviation
\( s^2 = \frac{\sum (s_i - \bar{s})^2}{K - 1} \), \( SD = \sqrt{s^2} \)
Standard error and confidence interval
\( SE = \frac{SD}{\sqrt{K}} \), \( CI = \bar{s} \pm z \cdot SE \)
Tuning optimism gap
For higher-better metrics: \( \bar{s}_{inner} - \bar{s}_{outer} \).
For lower-better metrics: \( \bar{s}_{outer} - \bar{s}_{inner} \).
Train validation gap
For higher-better metrics: \( \bar{s}_{train} - \bar{s}_{outer} \).
For lower-better metrics: \( \bar{s}_{outer} - \bar{s}_{train} \).

How to Use This Calculator

  1. Choose the problem type and metric that matches your experiment.
  2. Set the outer and inner fold counts from your workflow.
  3. Paste one training, inner, and outer score for each outer fold.
  4. Enter fold sizes to calculate the weighted estimate correctly.
  5. Add one chosen hyperparameter setting per outer fold line.
  6. Submit the form to view the summary, gaps, tables, and graph.
  7. Use CSV or PDF export to save the current evaluation report.
  8. Review the confidence interval and gaps before model selection.

Frequently Asked Questions

1. What does nested cross validation measure?

It estimates model performance while separating tuning from final evaluation. The outer loop tests generalization, and the inner loop selects settings within each training partition.

2. Why is the outer mean important?

The average outer validation score is the least biased summary in a nested workflow. It reflects performance on data never seen during inner tuning.

3. What is the tuning optimism gap?

It compares the average inner best score with the average outer score. A larger positive value means tuning looked better than final held-out performance.

4. Can I use regression metrics here?

Yes. Choose a regression metric such as RMSE, MAE, MSE, or R². Set the score direction correctly so the gaps are interpreted properly.

5. Why do I need one value per outer fold?

Each outer split contributes one final held-out score. Matching one score per fold preserves the correct nested summary and variability estimates.

6. What does a wide confidence interval mean?

It usually means fold outcomes vary a lot or you have few outer folds. Wider intervals suggest less certainty around the expected generalization score.

7. Should I rely on the best inner score?

Not for final reporting. Inner best scores help choose settings, but the outer results provide the fairer estimate of how the tuned pipeline should perform.

8. What does the train validation gap show?

It shows how much better the selected models perform on training data than on outer validation data. Larger gaps can indicate stronger overfitting risk.

Related Calculators

stratified splittrain set sizecross validation splitrepeated k foldk fold splittrain validation splitblocked cross validationbootstrap splittest set size

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.