Repeated K Fold Calculator

Test model stability across repeated folds quickly. Review scores, uncertainty, sample splits, and total runtime. Plan fairer experiments using transparent metrics and visual summaries.

Calculator Inputs

If you provide raw repeated fold scores, the calculator derives mean, standard deviation, standard error, and confidence interval from them.

Example Data Table

This example shows a small repeated evaluation log for a model tested with 3 folds and 2 repeats on 900 samples.

Repeat Fold Training Samples Validation Samples Accuracy Fit Time (min)
116003000.8422.7
126003000.8512.6
136003000.8382.5
216003000.8472.7
226003000.8562.8
236003000.8442.6

Formula Used

Total fits
total fits = k × repeats
Average validation size per fit
validation size = dataset size ÷ k
Average training size per fit
training size = dataset size − validation size
Mean score
mean = (sum of repeated fold scores) ÷ number of scores
Sample standard deviation
std = √[ Σ(score − mean)² ÷ (m − 1) ]
Standard error
SE = std ÷ √m
Confidence interval
CI = mean ± z × SE
Estimated runtime
runtime = total fits × (training time per fit + scoring time per fit)

When dataset size is not perfectly divisible by k, the calculator reports average fold sizes. Real folds may differ by one sample.

How to Use This Calculator

  1. Enter the dataset size, number of folds, repeats, and class count.
  2. Choose the evaluation label and metric name you want to track.
  3. Either enter mean and standard deviation, or paste raw repeated fold scores.
  4. Add timing inputs to estimate total experiment runtime, then submit the form.

Frequently Asked Questions

1) What does repeated k fold measure?

It measures model performance stability by running k fold cross validation several times with different data shuffles. This reduces luck from a single split and gives a stronger estimate of generalization.

2) Why repeat the folds?

Repeating the folds exposes the model to many train and validation arrangements. That usually lowers dependence on one favorable split and makes score uncertainty easier to quantify.

3) When should I paste raw score values?

Paste raw scores when you already have fold results from an experiment log. The calculator then derives the mean, standard deviation, standard error, and confidence interval directly from observed values.

4) What is a good number of folds?

Five or ten folds are common. Smaller datasets often benefit from higher k, but runtime grows because each additional fold means more model fits.

5) What does the confidence interval tell me?

It shows the uncertainty around the average validation score. A narrower interval suggests the repeated estimate is more precise and less sensitive to sample partitioning.

6) Does repeated k fold replace a final test set?

No. It improves model selection and performance estimation, but a clean untouched test set is still valuable for a final unbiased check after tuning.

7) Why does runtime increase quickly?

Because the total number of fits equals folds multiplied by repeats. A 10-fold setup repeated 8 times requires 80 separate training and scoring cycles.

8) Can this work for metrics other than accuracy?

Yes. You can label the score as accuracy, F1, AUC, recall, precision, or any other metric, as long as the values represent comparable repeated fold results.

Related Calculators

stratified splitnested cross validationtrain set sizecross validation splitk fold splittrain validation splitblocked cross validationbootstrap splittest set size

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.