Model setup
Tip: Use small datasets for best performance. This page trains inside your browser session.
Example data table
This sample matches the default CSV above.
| StudyHours | AttendanceRate | PriorScore | SupportLevel | Passed |
|---|---|---|---|---|
| 6 | 0.88 | 72 | High | Yes |
| 2 | 0.60 | 55 | Low | No |
| 4 | 0.75 | 63 | Medium | Yes |
| 1 | 0.50 | 40 | Low | No |
| 8 | 0.92 | 85 | High | Yes |
Formula used
- Classification: predicted class = majority vote across trees.
- Regression: predicted value = average of tree predictions.
- Vote share: class votes ÷ number of trees.
- Gini: 1 − Σ p(k)², where p(k) is class proportion.
- Entropy: −Σ p(k) log₂ p(k).
- MSE: variance of targets in a node (lower is better).
- Gain: parent impurity − weighted child impurity.
How to use this calculator
- Choose Classification for labels, or Regression for numbers.
- Paste or upload a CSV dataset with headers in the first row.
- Set the Target column name (your label/value column).
- Tune trees, depth, sampling, and feature subset options.
- Enter the new observation feature values to predict.
- Press Submit & Predict to view results above the form.
- Use the download buttons to export a CSV or PDF report.
For best results, include at least 50 rows and balanced classes.
FAQs
1) What does the prediction represent?
It is the forest’s combined output for your new observation. Classification returns the most-voted label. Regression returns the mean of all tree outputs.
2) Why do results change when I change the seed?
Random forests rely on randomness for bootstrap samples and feature subsets. A different seed changes those random choices, so the forest structure and output may shift.
3) What is out-of-bag accuracy or error?
Each tree leaves out some training rows during bootstrapping. Those left-out rows can be predicted by that tree, giving a built-in estimate without a separate validation set.
4) How should I set the number of trees?
More trees usually stabilize predictions but require more time. Start around 100–200. Increase if your vote shares or metrics fluctuate, then stop when improvements flatten.
5) What does feature subset per split do?
It limits how many features each split can try. This increases tree diversity and often improves generalization. sqrt(p) is a common default for classification.
6) Can I use text categories like “High” or “Low”?
Yes. Categorical columns are split using one-vs-rest rules. When predicting, type the exact category spelling that appears in your dataset for consistent behavior.
7) Why is my model accuracy low?
Low accuracy can come from noisy data, weak features, class imbalance, or too small a dataset. Try adding better predictors, collecting more rows, or adjusting depth and leaf sizes.