Advanced Repeated Stratified K Fold Calculator

Calculator Inputs

Total Samples

Number of Classes

K Folds

Repeats

Metric Name

Confidence Level

Class Labels

Comma separated class names.

Class Proportions (%)

Comma separated percentages. They are normalized automatically.

Random Seed

Average Training Minutes per Fit

Mean Score

Score Standard Deviation

Shuffle data before each repeat

Example Data Table

This example shows a three class dataset with proportions commonly used to test repeated stratified evaluation behavior.

Dataset	Total Samples	Classes	Class Split	Folds	Repeats	Mean Score	Std Dev
Customer Churn Model	1200	3	50%, 30%, 20%	5	10	0.842	0.031
Fraud Screening Model	5000	2	92%, 8%	5	8	0.914	0.024
Medical Triage Model	2400	4	40%, 25%, 20%, 15%	6	5	0.801	0.041

Formula Used

1. Total model fits
Total Fits = K Folds × Repeats

2. Validation fraction per fit
Validation Fraction = 1 ÷ K Folds

3. Training fraction per fit
Training Fraction = (K Folds − 1) ÷ K Folds

4. Approximate class count
Class Count = Total Samples × (Class Proportion ÷ 100)

5. Validation exposures across all repeats
Total Validation Exposures = Total Samples × Repeats

6. Training exposures across all repeats
Total Training Exposures = Total Samples × (K Folds − 1) × Repeats

7. Standard error of the mean score
SE = Score Standard Deviation ÷ √(Total Fits)

8. Confidence interval
Confidence Interval = Mean Score ± Z × SE

9. Runtime estimate
Runtime Minutes = Total Fits × Average Training Minutes per Fit

10. Stability score
Stability Score = (1 − |Standard Deviation ÷ Mean Score|) × 100

The calculator distributes each class across folds as evenly as possible. Extra samples are assigned one by one to the earliest folds.

How to Use This Calculator

Enter the total dataset size and the number of classes first. Add class labels and class proportions using comma separated values.

Choose the number of folds and repeats. Larger repeat counts improve stability estimates but increase total model fits and runtime.

Provide a score mean and standard deviation from previous experiments or pilot runs. These values drive the confidence interval and stability score.

Optionally enter a random seed and average training minutes per fit. The calculator uses them for planning reproducibility and compute time.

Press the calculate button. The result section appears below the header and above the form, showing fold composition, exposures, runtime, interval estimates, and charts.

Use the CSV button to export tables for documentation. Use the PDF button to save a print friendly version of the page.

Frequently Asked Questions

1. What does repeated stratified k fold measure?

It measures model performance across many balanced train and validation splits. Stratification preserves class proportions, while repeats reduce dependence on one random partition.

2. Why is stratification important for imbalanced classes?

Stratification keeps minority classes represented in every fold. That makes validation more reliable, especially for fraud, diagnosis, churn, and other skewed classification tasks.

3. How many repeats should I choose?

Use more repeats when scores vary strongly between splits or when datasets are small. Common settings are 5 to 10 repeats, but expensive models may need fewer.

4. Can this calculator be used for regression?

Not directly. Repeated stratified k fold is designed for classification because it preserves label distribution. Standard repeated k fold is usually better for regression tasks.

5. What does the stability score mean?

It is a quick planning indicator derived from relative score spread. Higher values suggest more consistent cross validation results, though it should not replace full statistical analysis.

6. Why might the fold counts differ slightly?

Class counts are whole numbers, so perfectly equal splits are not always possible. The calculator distributes leftover samples across folds as evenly as possible.

7. Does the confidence interval prove generalization performance?

No. It summarizes uncertainty around the supplied mean score using repeated split variability. External validation and careful experiment design are still essential.

8. What happens if a class has fewer samples than folds?

Repeated stratified k fold becomes invalid because each fold must receive at least one sample from every class. Reduce folds or gather more data.