Enter class counts and desired test percentage easily. Review rounded allocations across every label instantly. Export split summaries tables and charts for reuse later.
The calculator preserves class proportions while dividing data into training and test sets.
1. Total samples = sum of all class counts.
2. Target test samples = round(total samples × test percentage ÷ 100).
3. Expected test count for each class = class count ÷ total samples × target test samples.
4. Allocated test counts are rounded with the largest remainder method so the final class totals still match the target test size.
5. Train count for each class = original class count − allocated test count.
| Class | Original Count | Test % | Allocated Test | Allocated Train |
|---|---|---|---|---|
| Cat | 500 | 20% | 100 | 400 |
| Dog | 300 | 20% | 60 | 240 |
| Bird | 150 | 20% | 30 | 120 |
| Rabbit | 50 | 20% | 10 | 40 |
This example keeps the same class distribution in both splits because each label receives the same proportional share.
Class imbalance can distort evaluation when a random split accidentally underrepresents a minority label. A stratified split keeps each class share close to the original dataset, which improves comparison between training and test performance. It is especially useful for classification tasks, fraud detection, medical datasets, sentiment analysis, and any problem with uneven label counts.
When you plan a split manually, rounding creates hidden issues. You may target 20 percent for testing, but class counts are whole numbers, not fractions. This tool handles those rounding steps and still matches the nearest practical test size. It also reports distribution deviation so you can spot whether tiny classes are being squeezed too hard.
Use the result table to check each label’s original share, expected test count, final allocated test count, and final train count. The grouped chart gives a quick visual check before you build a data pipeline, a notebook workflow, or a model validation report. Exporting the output helps document dataset preparation decisions for future experiments.
A stratified split divides data while preserving each class proportion. It helps training and test sets resemble the original label distribution, which is useful for classification tasks with balanced or imbalanced classes.
A random split can overrepresent large classes and underrepresent small ones. Stratified splitting reduces that risk and produces more stable evaluation data, especially when labels are uneven.
It first computes each class’s expected fractional test count. Then it uses the largest remainder method so rounded class counts still add up to the target test size.
Yes. Very small classes may round to zero when the test size is small. The calculator warns you when that happens so you can raise the test percentage or collect more data.
Yes. Add as many classes as needed. The calculator works for binary and multi-class classification as long as you provide valid label counts.
Common choices are 10%, 20%, or 30%. The best value depends on dataset size, model complexity, and how much evaluation data you need for reliable performance measurement.
It helps plan one train and test split. For cross-validation, you can still use it to inspect class proportions before creating repeated folds elsewhere.
CSV is useful for analysis, documentation, and sharing structured counts. PDF is useful for reports, reviews, and keeping a printable record of the chosen split.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.