Random Forest Predictor Calculator

Model setup

Tip: Use small datasets for best performance. This page trains inside your browser session.

Task

Split criterion

Random seed

Number of trees

Max depth

Min samples to split

Min samples per leaf

Bootstrap sample ratio

Holdout test ratio

Feature subset per split

Dataset

Data source

First row must be headers. Target is your label/value column.

Upload CSV file

Used only when “Upload CSV” is selected.

Target column name

If blank, the last column is used.

Paste CSV data

New observation to predict

Fill only feature columns (not the target). For categories, type the exact label found in your dataset.

StudyHours

AttendanceRate

PriorScore

SupportLevel

Example data table

This sample matches the default CSV above.

StudyHours	AttendanceRate	PriorScore	SupportLevel	Passed
6	0.88	72	High	Yes
2	0.60	55	Low	No
4	0.75	63	Medium	Yes
1	0.50	40	Low	No
8	0.92	85	High	Yes

Formula used

Forest prediction

Classification: predicted class = majority vote across trees.
Regression: predicted value = average of tree predictions.
Vote share: class votes ÷ number of trees.

Tree split score

Gini: 1 − Σ p(k)², where p(k) is class proportion.
Entropy: −Σ p(k) log₂ p(k).
MSE: variance of targets in a node (lower is better).
Gain: parent impurity − weighted child impurity.

How to use this calculator

Choose Classification for labels, or Regression for numbers.
Paste or upload a CSV dataset with headers in the first row.
Set the Target column name (your label/value column).
Tune trees, depth, sampling, and feature subset options.
Enter the new observation feature values to predict.
Press Submit & Predict to view results above the form.
Use the download buttons to export a CSV or PDF report.

For best results, include at least 50 rows and balanced classes.

FAQs

1) What does the prediction represent?

It is the forest’s combined output for your new observation. Classification returns the most-voted label. Regression returns the mean of all tree outputs.

2) Why do results change when I change the seed?

Random forests rely on randomness for bootstrap samples and feature subsets. A different seed changes those random choices, so the forest structure and output may shift.

3) What is out-of-bag accuracy or error?

Each tree leaves out some training rows during bootstrapping. Those left-out rows can be predicted by that tree, giving a built-in estimate without a separate validation set.

4) How should I set the number of trees?

More trees usually stabilize predictions but require more time. Start around 100–200. Increase if your vote shares or metrics fluctuate, then stop when improvements flatten.

5) What does feature subset per split do?

It limits how many features each split can try. This increases tree diversity and often improves generalization. sqrt(p) is a common default for classification.

6) Can I use text categories like “High” or “Low”?

Yes. Categorical columns are split using one-vs-rest rules. When predicting, type the exact category spelling that appears in your dataset for consistent behavior.

7) Why is my model accuracy low?

Low accuracy can come from noisy data, weak features, class imbalance, or too small a dataset. Try adding better predictors, collecting more rows, or adjusting depth and leaf sizes.