Calculator Input Form
Choose class and branch counts, then enter each branch distribution in the matrix.
Example Data Table
This classic example compares a split with three branches and two classes.
| Branch | Positive | Negative | Total |
|---|---|---|---|
| Branch A | 2 | 3 | 5 |
| Branch B | 4 | 0 | 4 |
| Branch C | 3 | 2 | 5 |
| Total | 9 | 5 | 14 |
This example produces parent entropy near 0.9403 and information gain near 0.2467 using base 2.
Formula Used
1) Parent Entropy
Entropy(S) = - Σ p(i) × log(p(i))
Each class probability is its class count divided by the total sample count.
2) Weighted Child Entropy
Weighted Entropy = Σ (|Sv| / |S|) × Entropy(Sv)
Each branch entropy is weighted by the branch share of the total dataset.
3) Information Gain
Information Gain = Entropy(S) - Weighted Entropy
Higher information gain means the split reduces uncertainty more effectively.
4) Gain Ratio
Gain Ratio = Information Gain / Split Information
Gain ratio is optional, but it helps compare splits with many branches.
How to Use This Calculator
- Choose how many classes exist in your target variable.
- Choose how many branches the candidate split creates.
- Enter class labels and branch labels, separated by commas.
- Fill the count matrix with branch-by-class sample counts.
- Pick the entropy log base and desired decimal precision.
- Press Calculate Information Gain to show the result above the form.
- Use Download CSV or Download PDF to export the output.
- Review the chart and branch table to compare split quality visually.
FAQs
1) What does information gain measure?
Information gain measures how much uncertainty decreases after a split. A larger value usually means the split separates classes more effectively and improves decision tree quality.
2) Why is entropy important in decision trees?
Entropy measures class impurity. Low entropy means the branch is more pure. Decision trees often prefer splits that reduce entropy the most.
3) Can I use more than two classes?
Yes. This calculator supports multiclass targets. Increase the number of classes, update the labels, and enter the correct branch-by-class counts.
4) What is weighted child entropy?
Weighted child entropy is the average entropy after the split. Each branch entropy is multiplied by its proportion of total samples before summing.
5) Why does the calculator also show gain ratio?
Gain ratio adjusts information gain by split information. It can reduce bias toward attributes that create many branches with small sample groups.
6) Which log base should I choose?
Base 2 is the standard choice for information gain in many textbooks. Natural log and base 10 are also valid if you want different entropy units.
7) What happens if a class count is zero?
Zero counts are allowed. They do not add entropy because probabilities of zero are skipped in the calculation.
8) Can this calculator replace full model training?
No. It evaluates one candidate split only. Full model training also considers recursion, stopping rules, pruning, validation, and feature selection.