Calculator Inputs
Use the form below to estimate bootstrap resampling behavior, expected out-of-bag records, and baseline holdout split counts.
Example Data Table
| Scenario | Dataset Size | Sample Ratio | Runs | Train / Val / Test | Positive Rate |
|---|---|---|---|---|---|
| Binary fraud model | 12,000 | 100% | 250 | 70 / 15 / 15 | 6% |
| Churn classifier | 8,500 | 100% | 200 | 75 / 10 / 15 | 18% |
| Medical screening | 2,400 | 120% | 500 | 60 / 20 / 20 | 22% |
| Ad click prediction | 50,000 | 80% | 100 | 80 / 10 / 10 | 3.5% |
| Customer intent model | 15,000 | 100% | 300 | 70 / 15 / 15 | 14% |
Formula Used
1) Bootstrap sample size per run
m = N × (sample ratio ÷ 100)
2) Probability a record is never selected in one run
P(not selected) = ((N - 1) / N)m
3) Expected unique records in one bootstrap resample
Expected unique = N × (1 - ((N - 1) / N)m)
4) Expected out-of-bag records
Expected OOB = N - Expected unique
5) Expected duplicate draws
Expected duplicates = m - Expected unique
6) Multi-run expected coverage
Coverage after B runs = N × (1 - ((N - 1) / N)mB)
7) Expected positive class records
Positive records = count × positive class rate
How to Use This Calculator
- Enter the total number of observations in your dataset.
- Set the bootstrap sample ratio. Use 100% for classic bootstrap.
- Choose the number of bootstrap runs you plan to execute.
- Enter train, validation, and test percentages that total 100.
- Add the positive class rate to inspect minority-class exposure.
- Select decimal precision for cleaner reporting output.
- Press Calculate Bootstrap Split to display the results above the form.
- Use the CSV and PDF buttons to export the calculated report.
Frequently Asked Questions
1) What does this calculator estimate?
It estimates resample size, expected unique observations, duplicate draws, out-of-bag records, class counts, split counts, and multi-run coverage for validation planning.
2) Why are unique records smaller than the bootstrap sample size?
Bootstrap resampling draws with replacement. Some rows appear multiple times, so the expected number of distinct observations stays below the total number of draws.
3) Why is the out-of-bag rate often near 36.8%?
When the resample size equals the dataset size, each record has about an e-1 chance of never being selected in one run, leaving roughly 36.8% out of bag.
4) When should I prefer bootstrap validation?
Bootstrap validation is useful when data is limited, variance matters, or you want repeated resampling to study model stability instead of relying on one fixed split.
5) Can this help with imbalanced classification?
Yes. Enter the positive class rate to estimate expected minority-class counts in the resample and out-of-bag portion before training starts.
6) Can the bootstrap sample ratio exceed 100%?
Yes. Larger m-out-of-n settings can draw more records than the original dataset size, which raises duplicates and usually lowers the out-of-bag portion.
7) Are these outputs exact or simulated?
These outputs are expected values from bootstrap probability formulas. A real random draw may differ slightly, especially with smaller datasets.
8) Why show train, validation, and test counts too?
Those counts let you compare ordinary holdout planning with bootstrap diagnostics in one place, which helps design a more defensible evaluation workflow.