Analyze label spread across complex datasets with confidence. Track richness, evenness, and effective classes instantly. Turn raw counts into smarter training insights starting today.
Enter class labels and observed counts. The layout uses three columns on large screens, two on medium screens, and one on mobile.
This sample reflects an imbalanced multiclass training dataset. You can load it into the form with one click.
| Class Label | Observed Count | Use Case Note |
|---|---|---|
| Normal | 420 | Majority class in event monitoring. |
| Warning | 210 | Moderate signal category. |
| Fraud | 85 | Rare but important minority class. |
| Abuse | 60 | Small class with business risk. |
| Error | 35 | System fault examples. |
| Unknown | 20 | Low frequency fallback label. |
Shannon Weiner Diversity Index:
H = - Σ (pᵢ × log(pᵢ))
Where pᵢ = nᵢ / N
nᵢ is the count of each class, and N is total samples.
This index measures class uncertainty and spread. Higher values suggest broader distribution. Lower values reveal concentration, imbalance, or label dominance.
Use it during dataset review, drift tracking, stratified sampling checks, and active learning prioritization.
Tip: Zero-count rows are ignored in the entropy calculation, but they remain visible during editing.
It measures how evenly samples are distributed across classes. It rises when labels are more balanced and falls when one class dominates the dataset.
Balanced datasets often train more stable models. This index helps detect skewed label distributions before training, validation, or drift monitoring begins.
There is no universal cutoff. Compare the result against maximum entropy, evenness, and past datasets. Context matters more than a single number.
Yes. Zero-count rows stay in the form for planning, but they do not affect the entropy result because their probability is zero.
Entropy measures diversity magnitude. Evenness rescales that value against the theoretical maximum, showing how close the distribution is to perfect balance.
Gini impurity is familiar in decision-tree workflows. Showing both metrics gives a broader picture of class concentration and label spread.
Natural log is common for ecological and statistical work. Base 2 is intuitive for information theory. Choose one and stay consistent.
Yes. Use the CSV button for spreadsheets and the PDF button for reports, reviews, documentation, or stakeholder sharing.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.