Set dataset size, ratios, and rounding preferences. Review counts, percentages, leftovers, and optional test allocation. Build cleaner validation plans for trustworthy machine learning experiments.
The page uses a single stacked layout, while the calculator inputs switch to three columns on large screens, two on smaller screens, and one on mobile.
1. Exact Split Count
Exact Count = Dataset Size × (Split Percentage ÷ 100)
2. Normalized Percentage
Normalized Percentage = User Percentage ÷ Total Entered Percentage × 100
3. Allocated Integer Count
Allocated Count = Rounded Exact Count, adjusted so all split counts sum to the full dataset size.
4. Actual Share
Actual Share = Allocated Count ÷ Dataset Size × 100
The largest remainder method first assigns floor values, then distributes leftover records to splits with the biggest fractional remainders.
| Dataset Size | Train % | Validation % | Test % | Train Count | Validation Count | Test Count |
|---|---|---|---|---|---|---|
| 1,000 | 70 | 20 | 10 | 700 | 200 | 100 |
| 1,250 | 80 | 20 | 0 | 1,000 | 250 | 0 |
| 987 | 75 | 15 | 10 | 740 | 148 | 99 |
| 5,432 | 72.5 | 17.5 | 10 | 3,938 | 951 | 543 |
A validation split helps tune hyperparameters, compare models, and detect overfitting before touching the final test set.
Include a test set when you need an untouched benchmark for final reporting, model comparison, or production readiness checks.
It floors exact counts first, then assigns leftover records to the splits with the highest decimal remainders. This preserves the dataset total cleanly.
Yes in strict mode. Auto normalize is useful when you enter relative weights instead of finished percentages.
Stratified sampling tries to keep class proportions similar across splits. It is especially helpful for imbalanced classification datasets.
Disable shuffle for time series data, ordered experiments, or workflows where record order carries important signal.
No. It plans the counts and ratios only. Use your preferred machine learning library to apply the real split afterward.
A common starting point is 70 20 10 or 80 20 0, depending on dataset size and whether you need a separate test set.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.