Calculator Inputs
Example Data Table
| Scenario | Total Samples | Folds | Repeats | Holdout % | Positive Rate % | Average Validation | Average Training |
|---|---|---|---|---|---|---|---|
| Binary Classifier A | 1,200 | 5 | 2 | 10 | 35 | 216 | 864 |
| Imbalanced Dataset B | 8,000 | 10 | 1 | 5 | 8 | 760 | 6,840 |
| Small Medical Study | 240 | 4 | 3 | 0 | 22 | 60 | 180 |
Formula Used
Holdout Samples = Total Samples × Holdout Percentage
Development Samples = Total Samples − Holdout Samples
Base Validation Size = floor(Development Samples ÷ Folds)
Remainder = Development Samples mod Folds
Fold Validation Size = Base Validation Size, with remainder distributed one by one
Fold Training Size = Development Samples − Fold Validation Size
Total Model Fits = Folds × Repeats
Expected Positive Cases in a Fold = Fold Size × Positive Class Rate
Recommended Minimum Total = Required Development Samples adjusted for holdout reservation
This calculator estimates how many samples each cross validation fold receives for training and validation. It also checks class coverage, repeated resampling burden, and a buffered planning target for safer experiment design.
How to Use This Calculator
- Enter the total number of observations available for modeling.
- Select the number of folds and the number of repeats.
- Add any external holdout percentage if you want a final untouched test set.
- Provide the estimated positive class rate for classification planning.
- Set minimum validation and minimum positive case goals per fold.
- Add a stability buffer to build extra margin into the final sample target.
- Press the calculate button to review fold sizes, exposures, recommendations, and the chart.
- Download the fold table as CSV or export the results summary as PDF.
Frequently Asked Questions
1. What does this calculator estimate?
It estimates fold wise training and validation sample sizes, total model fits, class coverage, and suggested total sample targets for stronger cross validation planning.
2. Why include an external holdout set?
An external holdout set gives you a final untouched evaluation set. It improves final performance assessment, but reduces the development data available for cross validation.
3. Why does the calculator ask for positive class rate?
Class rate matters when datasets are imbalanced. Very small positive counts inside validation folds can make metric estimates unstable and model comparisons less reliable.
4. What does repeated cross validation change?
Repeats increase the number of model fits and validation assignments. They can improve stability of estimates, but require more computation and time.
5. Why are some folds one sample larger?
When development samples are not perfectly divisible by the number of folds, the remainder is distributed across early folds to keep splits as balanced as possible.
6. What is the buffered stability target?
It is a planning target that adds extra margin over the minimum estimated requirement. This helps protect against data loss, exclusions, and unstable fold composition.
7. Can I use this for regression problems?
Yes. For regression, the positive class rate field becomes less important, but fold size, repeats, holdout allocation, and validation coverage remain useful planning measures.
8. Does this replace formal power analysis?
No. This tool supports resampling design decisions. Formal power analysis, domain risk, and model complexity should still guide final sample size decisions.