Decision Tree Information Gain Calculator

Calculator Input Form

Choose class and branch counts, then enter each branch distribution in the matrix.

Number of Classes

Use between 2 and 6 classes.

Number of Branches

Use between 2 and 6 split branches.

Entropy Log Base

Base 2 is common for information gain.

Class Labels

Separate labels with commas.

Branch Labels

Separate labels with commas.

Decimal Places

Controls displayed precision.

Branch-Class Count Matrix

Rows are branches. Columns are class counts inside each branch.

Example Data Table

This classic example compares a split with three branches and two classes.

Branch	Positive	Negative	Total
Branch A	2	3	5
Branch B	4	0	4
Branch C	3	2	5
Total	9	5	14

This example produces parent entropy near 0.9403 and information gain near 0.2467 using base 2.

Formula Used

1) Parent Entropy

Entropy(S) = - Σ p(i) × log(p(i))

Each class probability is its class count divided by the total sample count.

2) Weighted Child Entropy

Weighted Entropy = Σ (|Sv| / |S|) × Entropy(Sv)

Each branch entropy is weighted by the branch share of the total dataset.

3) Information Gain

Information Gain = Entropy(S) - Weighted Entropy

Higher information gain means the split reduces uncertainty more effectively.

4) Gain Ratio

Gain Ratio = Information Gain / Split Information

Gain ratio is optional, but it helps compare splits with many branches.

How to Use This Calculator

Choose how many classes exist in your target variable.
Choose how many branches the candidate split creates.
Enter class labels and branch labels, separated by commas.
Fill the count matrix with branch-by-class sample counts.
Pick the entropy log base and desired decimal precision.
Press Calculate Information Gain to show the result above the form.
Use Download CSV or Download PDF to export the output.
Review the chart and branch table to compare split quality visually.

FAQs

1) What does information gain measure?

Information gain measures how much uncertainty decreases after a split. A larger value usually means the split separates classes more effectively and improves decision tree quality.

2) Why is entropy important in decision trees?

Entropy measures class impurity. Low entropy means the branch is more pure. Decision trees often prefer splits that reduce entropy the most.

3) Can I use more than two classes?

Yes. This calculator supports multiclass targets. Increase the number of classes, update the labels, and enter the correct branch-by-class counts.

4) What is weighted child entropy?

Weighted child entropy is the average entropy after the split. Each branch entropy is multiplied by its proportion of total samples before summing.

5) Why does the calculator also show gain ratio?

Gain ratio adjusts information gain by split information. It can reduce bias toward attributes that create many branches with small sample groups.

6) Which log base should I choose?

Base 2 is the standard choice for information gain in many textbooks. Natural log and base 10 are also valid if you want different entropy units.

7) What happens if a class count is zero?

Zero counts are allowed. They do not add entropy because probabilities of zero are skipped in the calculation.

8) Can this calculator replace full model training?

No. It evaluates one candidate split only. Full model training also considers recursion, stopping rules, pruning, validation, and feature selection.