| X: Weather | Y: Decision | Count | Meaning |
|---|---|---|---|
| Sunny | Buy | 32 | Thirty-two sunny cases ended with a buy action. |
| Sunny | Skip | 8 | Eight sunny cases ended with a skip action. |
| Rainy | Buy | 7 | Seven rainy cases ended with a buy action. |
| Rainy | Skip | 23 | Twenty-three rainy cases ended with a skip action. |
| Cloudy | Buy | 18 | Eighteen cloudy cases ended with a buy action. |
| Cloudy | Skip | 12 | Twelve cloudy cases ended with a skip action. |
Conditional entropy of Y given X:
H(Y|X) = - Σx Σy P(x,y) logb(P(y|x))
P(y|x) = P(x,y) / P(x)
Conditional entropy of X given Y:
H(X|Y) = - Σy Σx P(x,y) logb(P(x|y))
P(x|y) = P(x,y) / P(y)
Related information measures:
I(X;Y) = H(X) + H(Y) - H(X,Y)
H(Y|X) = H(X,Y) - H(X)
H(X|Y) = H(X,Y) - H(Y)
- Enter one joint outcome per row.
- Place the X label first, the Y label second, and the value third.
- Select whether the main result should be H(Y|X) or H(X|Y).
- Choose the logarithm base for bits, nats, or dits.
- Add smoothing only when sparse zero cells need adjustment.
- Press the calculate button.
- Review the result panel above the form.
- Download the CSV or PDF report when needed.
What Conditional Entropy Means
Conditional entropy shows how much uncertainty remains about one variable after another variable is known. It is useful when events arrive in pairs. Examples include class and feature, input and output, customer segment and action, or source and destination. A low value means the second variable explains much of the first. A high value means the known variable gives little help. This calculator accepts joint counts or joint probabilities, so it fits survey data, logs, experiments, and probability models.
How The Calculation Works
The method starts by grouping each row by the conditioning variable. For H(Y|X), each X group becomes a small distribution over Y. The tool divides each joint value by the group total. It then multiplies each conditional probability by its logarithm. Zero values are ignored because they add no entropy. The weighted group entropies are summed by the probability of each X group. The result is measured in bits, nats, or dits, depending on the selected base.
Why The Metric Matters
Conditional entropy is important in data science and decision work. It helps compare features before building a model. It also supports mutual information analysis, channel noise checks, text classification, clustering review, and reliability studies. When H(Y|X) is near zero, X almost determines Y. When it is close to H(Y), X adds little information. The difference between H(Y) and H(Y|X) is mutual information. That value estimates how much uncertainty was removed.
Using Results Wisely
Use clean data for the best result. Keep one pair per row. Add a count, weight, or probability for that pair. Labels may be words or numbers. Counts do not need to sum to one, because the page normalizes them. Probabilities should be nonnegative. Optional smoothing can reduce sharp zero effects in sparse tables. After calculating, review the summary metrics, conditional rows, and chart. Download the CSV for spreadsheets. Save the PDF for reports or classroom notes. This page supports practical interpretation. The formula section explains each symbol. The usage steps guide users through entry, options, calculation, and export. The result panel appears above the form, so answers are visible immediately.
1. What is conditional entropy?
Conditional entropy measures the remaining uncertainty in one variable after another variable is known. It answers questions like, “How uncertain is Y when X has already been observed?”
2. Can I use raw counts?
Yes. Raw counts, weights, probabilities, and percentages are accepted. The calculator normalizes values before computing probabilities, so totals do not need to equal one.
3. What does H(Y|X) mean?
H(Y|X) means the uncertainty left in Y after X is known. A smaller value means X gives stronger information about Y.
4. What does H(X|Y) mean?
H(X|Y) means the uncertainty left in X after Y is known. It reverses the conditioning direction and may differ from H(Y|X).
5. Which log base should I choose?
Use base 2 for bits, natural base for nats, and base 10 for dits. Base 2 is common in information theory and data science.
6. What is smoothing?
Smoothing adds a small value to every cell in the joint table. It can reduce extreme effects from missing or sparse category combinations.
7. Is lower conditional entropy always better?
Lower values mean less remaining uncertainty, but “better” depends on your goal. For prediction, lower values often show stronger explanatory power.
8. What does mutual information show?
Mutual information shows how much uncertainty one variable removes about another. It is zero when the variables are independent.