Advanced Cross Validation Sample Size Calculator

Calculator Inputs

Total Samples

Number of Folds

Repeats

External Holdout (%)

Positive Class Rate (%)

Minimum Validation Samples per Fold

Minimum Positive Cases per Fold

Stability Buffer (%)

Example Data Table

Scenario	Total Samples	Folds	Repeats	Holdout %	Positive Rate %	Average Validation	Average Training
Binary Classifier A	1,200	5	2	10	35	216	864
Imbalanced Dataset B	8,000	10	1	5	8	760	6,840
Small Medical Study	240	4	3	0	22	60	180

Formula Used

Holdout Samples = Total Samples × Holdout Percentage

Development Samples = Total Samples − Holdout Samples

Base Validation Size = floor(Development Samples ÷ Folds)

Remainder = Development Samples mod Folds

Fold Validation Size = Base Validation Size, with remainder distributed one by one

Fold Training Size = Development Samples − Fold Validation Size

Total Model Fits = Folds × Repeats

Expected Positive Cases in a Fold = Fold Size × Positive Class Rate

Recommended Minimum Total = Required Development Samples adjusted for holdout reservation

This calculator estimates how many samples each cross validation fold receives for training and validation. It also checks class coverage, repeated resampling burden, and a buffered planning target for safer experiment design.

How to Use This Calculator

Enter the total number of observations available for modeling.
Select the number of folds and the number of repeats.
Add any external holdout percentage if you want a final untouched test set.
Provide the estimated positive class rate for classification planning.
Set minimum validation and minimum positive case goals per fold.
Add a stability buffer to build extra margin into the final sample target.
Press the calculate button to review fold sizes, exposures, recommendations, and the chart.
Download the fold table as CSV or export the results summary as PDF.

Frequently Asked Questions

1. What does this calculator estimate?

It estimates fold wise training and validation sample sizes, total model fits, class coverage, and suggested total sample targets for stronger cross validation planning.

2. Why include an external holdout set?

An external holdout set gives you a final untouched evaluation set. It improves final performance assessment, but reduces the development data available for cross validation.

3. Why does the calculator ask for positive class rate?

Class rate matters when datasets are imbalanced. Very small positive counts inside validation folds can make metric estimates unstable and model comparisons less reliable.

4. What does repeated cross validation change?

Repeats increase the number of model fits and validation assignments. They can improve stability of estimates, but require more computation and time.

5. Why are some folds one sample larger?

When development samples are not perfectly divisible by the number of folds, the remainder is distributed across early folds to keep splits as balanced as possible.

6. What is the buffered stability target?

It is a planning target that adds extra margin over the minimum estimated requirement. This helps protect against data loss, exclusions, and unstable fold composition.

7. Can I use this for regression problems?

Yes. For regression, the positive class rate field becomes less important, but fold size, repeats, holdout allocation, and validation coverage remain useful planning measures.

8. Does this replace formal power analysis?

No. This tool supports resampling design decisions. Formal power analysis, domain risk, and model complexity should still guide final sample size decisions.