Measure cluster disorder and category concentration accurately. Compare partitions, estimate information gain, and assess balance. Turn category counts into clearer clustering decisions for teams.
Enter one cluster per row. Each value in a row represents the count of items from each category inside that cluster.
This example shows four clusters and four ground-truth categories. Each cell is the count of items from a category assigned to a cluster.
| Cluster | Class 1 | Class 2 | Class 3 | Class 4 |
|---|---|---|---|---|
| Cluster A | 40 | 5 | 5 | 0 |
| Cluster B | 6 | 30 | 4 | 0 |
| Cluster C | 3 | 6 | 28 | 3 |
| Cluster D | 2 | 2 | 5 | 31 |
Cluster entropy: For cluster k, entropy is H(k) = -Σ p(j|k) log_b p(j|k), where p(j|k) is the category share inside the cluster.
Weighted entropy: H_weighted = Σ (n_k / N) × H(k). This summarizes disorder across all clusters while respecting cluster sizes.
Normalized entropy: H_norm(k) = H(k) / log_b(m), where m is the number of categories. Values near zero indicate purer clusters.
Purity: Purity(k) = max(category count in cluster k) / n_k. Higher purity means one category dominates the cluster.
Base entropy: H_base = -Σ P(j) log_b P(j), where P(j) is the full dataset category probability.
Information gain: IG = H_base - H_weighted. Larger gain means the clustering reduces uncertainty more effectively.
Cluster balance entropy: This applies entropy to cluster size proportions. Higher normalized balance means clusters are more evenly sized.
Entropy measures how mixed categories are inside a cluster. Lower entropy suggests cleaner separation, while higher entropy shows stronger overlap among category memberships.
Weighted entropy accounts for cluster size. Large clusters influence the total more than tiny clusters, giving a more realistic summary of overall clustering quality.
Information gain compares dataset entropy before clustering with weighted entropy after clustering. A larger value means the clustering explains category structure more effectively.
No. A clustering can show high purity with many tiny clusters. Reviewing entropy, purity, and cluster balance together gives a better performance picture.
Normalized entropy rescales entropy between zero and one. That makes comparisons easier when datasets use different numbers of categories.
This calculator works best when each cluster can be compared against known categories, classes, or segments. That matrix provides the category distribution needed for entropy calculations.
It measures how evenly observations are distributed across clusters. Very low balance may reveal dominant clusters, fragmentation, or possible tuning problems.
Consider improvement when weighted entropy is high, purity is weak, or one cluster dominates the dataset. Feature engineering and better hyperparameters often help.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.