Stratified Split Calculator

Model class-aware splits with flexible ratios and totals. See each per-class allocation before training begins. Download results fast and document every dataset split confidently.

Split Results

Results appear here after calculation and stay above the form.

0
Total samples
0
Classes used
0.00%
Max split drift
Ready
Quality check

Train split

0

0.00%

Validation split

0

0.00%

Test split

0

0.00%

Applied settings

Warnings and notes

Class Original Count Original % Train Validation Test Train Drift Validation Drift Test Drift

Calculator Inputs

Use the responsive grid below. Large screens show three columns, smaller screens show two, and mobile shows one.

Computed total samples
0

Class Distribution

Add each class and its original count. Empty labels auto-generate names.

Example Data Table

This sample shows how class proportions remain close to the original dataset after allocation.

Class Original Count Original % Train Validation Test
Class A 500 50.00% 350 75 75
Class B 300 30.00% 210 45 45
Class C 200 20.00% 140 30 30

Formula Used

Stratified splitting keeps each class close to its original share inside every output split.

Original Class Share = Class Count / Total Dataset Count Normalized Split Ratio = Split Ratio / (Train + Validation + Test) Ideal Split Count for a Class = Class Count × Normalized Split Ratio Actual Split Count = Rounded Ideal Counts, adjusted so class totals remain exact Split Drift = Actual Split Share − Original Class Share

The calculator reports the largest absolute drift among train, validation, and test so you can judge how much rounding changed class balance.

How to Use This Calculator

  1. Enter a dataset name, then set train, validation, and test percentages.
  2. Choose a rounding method that fits your workflow.
  3. Set the minority alert threshold to flag very small classes.
  4. Add each class and enter its sample count.
  5. Click the calculate button to generate split totals and drift values.
  6. Review warnings, especially when tiny classes cannot support every split.
  7. Download the results as CSV or PDF for project records.

Frequently Asked Questions

1. What does stratified splitting do?

It divides a dataset into train, validation, and test sets while keeping each class close to its original percentage. This is helpful when rare classes would otherwise vanish from smaller splits.

2. Why do the split counts sometimes differ from exact decimals?

Sample counts must be whole numbers. The calculator first computes ideal decimal values, then rounds and rebalances them so each class still sums back to its original total.

3. When should I use largest remainder rounding?

Use it when you want a fair, proportion-focused allocation. It floors each ideal count and then assigns leftover samples to the largest fractional remainders.

4. Why might a warning appear for tiny classes?

A class with very few samples may be too small to place at least one item in every active split. The warning helps you spot when manual adjustments may be smarter.

5. Does the calculator generate actual row indexes?

No. It computes class counts for each split. Use those counts as planning targets before applying a real split function inside your machine learning pipeline.

6. What is split drift?

Split drift measures how far a class percentage inside a split moves from the original dataset percentage. Lower drift means the split better preserves the class distribution.

7. Should my ratios always sum to 100?

That is best for readability, but the calculator can normalize other positive totals. For example, 8, 1, and 1 becomes the same as 80%, 10%, and 10%.

8. Can I use this for imbalanced datasets?

Yes. It is especially useful there. The minority threshold setting helps highlight rare classes that may need oversampling, class weighting, or a different evaluation strategy.

Related Calculators

nested cross validationtrain set sizecross validation splitrepeated k foldk fold splittrain validation splitblocked cross validationbootstrap splittest set size

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.