Bootstrap Split Calculator

Analyze resampling behavior before training costly models. Compare unique coverage, duplicates, and split counts today. Make validation decisions using clearer dataset evidence and expectations.

Calculator Inputs

Use the form below to estimate bootstrap resampling behavior, expected out-of-bag records, and baseline holdout split counts.

Tip: Classic bootstrap often uses a 100% sample ratio. That usually leaves about 36.8% of records out of bag per run.

Example Data Table

Scenario Dataset Size Sample Ratio Runs Train / Val / Test Positive Rate
Binary fraud model 12,000 100% 250 70 / 15 / 15 6%
Churn classifier 8,500 100% 200 75 / 10 / 15 18%
Medical screening 2,400 120% 500 60 / 20 / 20 22%
Ad click prediction 50,000 80% 100 80 / 10 / 10 3.5%
Customer intent model 15,000 100% 300 70 / 15 / 15 14%

Formula Used

1) Bootstrap sample size per run

m = N × (sample ratio ÷ 100)

2) Probability a record is never selected in one run

P(not selected) = ((N - 1) / N)m

3) Expected unique records in one bootstrap resample

Expected unique = N × (1 - ((N - 1) / N)m)

4) Expected out-of-bag records

Expected OOB = N - Expected unique

5) Expected duplicate draws

Expected duplicates = m - Expected unique

6) Multi-run expected coverage

Coverage after B runs = N × (1 - ((N - 1) / N)mB)

7) Expected positive class records

Positive records = count × positive class rate

How to Use This Calculator

  1. Enter the total number of observations in your dataset.
  2. Set the bootstrap sample ratio. Use 100% for classic bootstrap.
  3. Choose the number of bootstrap runs you plan to execute.
  4. Enter train, validation, and test percentages that total 100.
  5. Add the positive class rate to inspect minority-class exposure.
  6. Select decimal precision for cleaner reporting output.
  7. Press Calculate Bootstrap Split to display the results above the form.
  8. Use the CSV and PDF buttons to export the calculated report.

Frequently Asked Questions

1) What does this calculator estimate?

It estimates resample size, expected unique observations, duplicate draws, out-of-bag records, class counts, split counts, and multi-run coverage for validation planning.

2) Why are unique records smaller than the bootstrap sample size?

Bootstrap resampling draws with replacement. Some rows appear multiple times, so the expected number of distinct observations stays below the total number of draws.

3) Why is the out-of-bag rate often near 36.8%?

When the resample size equals the dataset size, each record has about an e-1 chance of never being selected in one run, leaving roughly 36.8% out of bag.

4) When should I prefer bootstrap validation?

Bootstrap validation is useful when data is limited, variance matters, or you want repeated resampling to study model stability instead of relying on one fixed split.

5) Can this help with imbalanced classification?

Yes. Enter the positive class rate to estimate expected minority-class counts in the resample and out-of-bag portion before training starts.

6) Can the bootstrap sample ratio exceed 100%?

Yes. Larger m-out-of-n settings can draw more records than the original dataset size, which raises duplicates and usually lowers the out-of-bag portion.

7) Are these outputs exact or simulated?

These outputs are expected values from bootstrap probability formulas. A real random draw may differ slightly, especially with smaller datasets.

8) Why show train, validation, and test counts too?

Those counts let you compare ordinary holdout planning with bootstrap diagnostics in one place, which helps design a more defensible evaluation workflow.

Related Calculators

stratified splitnested cross validationtrain set sizecross validation splitrepeated k foldk fold splittrain validation splitblocked cross validationtest set size

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.