Bootstrap Split Calculator for AI & Machine Learning

Calculator Inputs

Use the form below to estimate bootstrap resampling behavior, expected out-of-bag records, and baseline holdout split counts.

Dataset size

Bootstrap sample ratio (%)

Bootstrap runs

Train split (%)

Validation split (%)

Test split (%)

Positive class rate (%)

Random seed reference

Decimal precision

Tip: Classic bootstrap often uses a 100% sample ratio. That usually leaves about 36.8% of records out of bag per run.

Example Data Table

Scenario	Dataset Size	Sample Ratio	Runs	Train / Val / Test	Positive Rate
Binary fraud model	12,000	100%	250	70 / 15 / 15	6%
Churn classifier	8,500	100%	200	75 / 10 / 15	18%
Medical screening	2,400	120%	500	60 / 20 / 20	22%
Ad click prediction	50,000	80%	100	80 / 10 / 10	3.5%
Customer intent model	15,000	100%	300	70 / 15 / 15	14%

Formula Used

1) Bootstrap sample size per run

m = N × (sample ratio ÷ 100)

2) Probability a record is never selected in one run

P(not selected) = ((N - 1) / N)^m

3) Expected unique records in one bootstrap resample

Expected unique = N × (1 - ((N - 1) / N)^m)

4) Expected out-of-bag records

Expected OOB = N - Expected unique

5) Expected duplicate draws

Expected duplicates = m - Expected unique

6) Multi-run expected coverage

Coverage after B runs = N × (1 - ((N - 1) / N)^mB)

7) Expected positive class records

Positive records = count × positive class rate

How to Use This Calculator

Enter the total number of observations in your dataset.
Set the bootstrap sample ratio. Use 100% for classic bootstrap.
Choose the number of bootstrap runs you plan to execute.
Enter train, validation, and test percentages that total 100.
Add the positive class rate to inspect minority-class exposure.
Select decimal precision for cleaner reporting output.
Press Calculate Bootstrap Split to display the results above the form.
Use the CSV and PDF buttons to export the calculated report.

Frequently Asked Questions

1) What does this calculator estimate?

It estimates resample size, expected unique observations, duplicate draws, out-of-bag records, class counts, split counts, and multi-run coverage for validation planning.

2) Why are unique records smaller than the bootstrap sample size?

Bootstrap resampling draws with replacement. Some rows appear multiple times, so the expected number of distinct observations stays below the total number of draws.

3) Why is the out-of-bag rate often near 36.8%?

When the resample size equals the dataset size, each record has about an e^-1 chance of never being selected in one run, leaving roughly 36.8% out of bag.

4) When should I prefer bootstrap validation?

Bootstrap validation is useful when data is limited, variance matters, or you want repeated resampling to study model stability instead of relying on one fixed split.

5) Can this help with imbalanced classification?

Yes. Enter the positive class rate to estimate expected minority-class counts in the resample and out-of-bag portion before training starts.

6) Can the bootstrap sample ratio exceed 100%?

Yes. Larger m-out-of-n settings can draw more records than the original dataset size, which raises duplicates and usually lowers the out-of-bag portion.

7) Are these outputs exact or simulated?

These outputs are expected values from bootstrap probability formulas. A real random draw may differ slightly, especially with smaller datasets.

8) Why show train, validation, and test counts too?

Those counts let you compare ordinary holdout planning with bootstrap diagnostics in one place, which helps design a more defensible evaluation workflow.