Measure out-of-bag mistakes with practical ensemble diagnostics fast. Track confidence, gap, vote margin, and reliability. Use clear outputs to improve tuning and validation decisions.
Use the fields below to estimate out-of-bag error, accuracy, uncertainty, sampling coverage, and generalization gaps for a random forest model.
OOB Error = Misclassified OOB Predictions / Total OOB Predictions
OOB Accuracy = 1 - OOB Error
Expected OOB Share ≈ e^(-Bootstrap Ratio)Average OOB Votes = Number of Trees × Expected OOB Share
Generalization Gap = Training Accuracy - OOB Accuracy
CI = p ± 1.96 × sqrt(p × (1 - p) / n)p is OOB error as a proportion and n is total OOB predictions.
OOB error is a built-in validation estimate for bagged tree ensembles. It approximates how the model performs on unseen data without requiring a separate validation pass for every tree.
| Run | Trees | OOB Predictions | Misclassified OOB | OOB Error | OOB Accuracy | Training Accuracy | Validation Accuracy |
|---|---|---|---|---|---|---|---|
| Baseline | 200 | 12,000 | 1,260 | 10.50% | 89.50% | 94.40% | 88.90% |
| Tuned Depth | 300 | 12,000 | 1,080 | 9.00% | 91.00% | 95.10% | 90.70% |
| More Trees | 500 | 12,000 | 948 | 7.90% | 92.10% | 96.40% | 91.70% |
| Class Weighting | 500 | 12,000 | 912 | 7.60% | 92.40% | 95.80% | 92.00% |
Random forests leave some records out of each bootstrap draw. Those left-out records become out-of-bag observations for the corresponding trees. Aggregating their predictions provides a practical internal estimate of real-world error, often close to a holdout validation score when the dataset is representative.
OOB error is the fraction of wrong predictions made on samples not included in each tree’s bootstrap draw. It acts like built-in validation for bagged ensembles.
It often gives a reliable internal estimate because each sample is predicted by trees that never saw it during fitting. It still helps to keep an external test set for final confirmation.
Usually yes, but not alone. You should also inspect class balance, precision, recall, vote confidence, and the difference between training, OOB, and holdout metrics.
A large positive gap can indicate overfitting, data leakage, or trees that memorize noisy patterns. Review depth, feature quality, and duplicate records.
There is no universal number, but stability improves as tree count grows. Many practical models settle between a few hundred and one thousand trees.
It estimates how often a sample stays outside a bootstrap draw. With standard sampling, this is close to 36.8% per tree and helps approximate OOB vote coverage.
Yes. A model can show attractive OOB accuracy while still missing minority classes. Always inspect per-class recall, confusion matrices, and cost-sensitive metrics.
No. Small differences are normal because OOB samples and holdout splits are not identical. Large differences suggest distribution shifts, leakage, or unstable tuning.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.