Split Results
Results appear here after calculation and stay above the form.
Train split
0
0.00%
Validation split
0
0.00%
Test split
0
0.00%
Applied settings
Warnings and notes
| Class | Original Count | Original % | Train | Validation | Test | Train Drift | Validation Drift | Test Drift |
|---|
Calculator Inputs
Use the responsive grid below. Large screens show three columns, smaller screens show two, and mobile shows one.
Example Data Table
This sample shows how class proportions remain close to the original dataset after allocation.
| Class | Original Count | Original % | Train | Validation | Test |
|---|---|---|---|---|---|
| Class A | 500 | 50.00% | 350 | 75 | 75 |
| Class B | 300 | 30.00% | 210 | 45 | 45 |
| Class C | 200 | 20.00% | 140 | 30 | 30 |
Formula Used
Stratified splitting keeps each class close to its original share inside every output split.
Original Class Share = Class Count / Total Dataset Count Normalized Split Ratio = Split Ratio / (Train + Validation + Test) Ideal Split Count for a Class = Class Count × Normalized Split Ratio Actual Split Count = Rounded Ideal Counts, adjusted so class totals remain exact Split Drift = Actual Split Share − Original Class ShareThe calculator reports the largest absolute drift among train, validation, and test so you can judge how much rounding changed class balance.
How to Use This Calculator
- Enter a dataset name, then set train, validation, and test percentages.
- Choose a rounding method that fits your workflow.
- Set the minority alert threshold to flag very small classes.
- Add each class and enter its sample count.
- Click the calculate button to generate split totals and drift values.
- Review warnings, especially when tiny classes cannot support every split.
- Download the results as CSV or PDF for project records.
Frequently Asked Questions
1. What does stratified splitting do?
It divides a dataset into train, validation, and test sets while keeping each class close to its original percentage. This is helpful when rare classes would otherwise vanish from smaller splits.
2. Why do the split counts sometimes differ from exact decimals?
Sample counts must be whole numbers. The calculator first computes ideal decimal values, then rounds and rebalances them so each class still sums back to its original total.
3. When should I use largest remainder rounding?
Use it when you want a fair, proportion-focused allocation. It floors each ideal count and then assigns leftover samples to the largest fractional remainders.
4. Why might a warning appear for tiny classes?
A class with very few samples may be too small to place at least one item in every active split. The warning helps you spot when manual adjustments may be smarter.
5. Does the calculator generate actual row indexes?
No. It computes class counts for each split. Use those counts as planning targets before applying a real split function inside your machine learning pipeline.
6. What is split drift?
Split drift measures how far a class percentage inside a split moves from the original dataset percentage. Lower drift means the split better preserves the class distribution.
7. Should my ratios always sum to 100?
That is best for readability, but the calculator can normalize other positive totals. For example, 8, 1, and 1 becomes the same as 80%, 10%, and 10%.
8. Can I use this for imbalanced datasets?
Yes. It is especially useful there. The minority threshold setting helps highlight rare classes that may need oversampling, class weighting, or a different evaluation strategy.