Result Summary
Calculator Input
This page stays in a single vertical flow, while the calculator fields use a responsive 3-column, 2-column, and 1-column layout.
Example Data Table
This sample contingency table shows whether a categorical feature distribution differs across predicted classes.
| Feature Group | Class A | Class B | Class C | Row Total |
|---|---|---|---|---|
| Feature Low | 25 | 30 | 20 | 75 |
| Feature Medium | 15 | 18 | 27 | 60 |
| Feature High | 10 | 22 | 33 | 65 |
| Column Total | 50 | 70 | 80 | 200 |
Formula Used
χ² = Σ ((O - E)² / E)
Eij = (Row Totali × Column Totalj) / Grand Total
Independence / Homogeneity: df = (r - 1)(c - 1)
Goodness-of-Fit: df = k - 1 - m
Cramer's V = √(χ² / (n × min(r - 1, c - 1)))
Phi for 2x2 tables = √(χ² / n)
Cohen's w for goodness-of-fit = √(χ² / n)
Where O is observed frequency, E is expected frequency, n is total sample size, r is row count, c is column count, k is category count, and m is the number of fitted parameters.
How to Use This Calculator
- Select the appropriate test type.
- Enter alpha, dimensions, and labels.
- Paste observed counts into the correct input field.
- For goodness-of-fit, enter expected probabilities, counts, or weights.
- Submit the form to see the result above the calculator.
- Review chi-square, p-value, critical value, residuals, and effect size.
- Inspect the charts to identify influential categories or cells.
- Use the CSV or PDF buttons to save your report.
FAQs
1. What does this calculator test?
It tests whether observed category counts differ from expected counts, or whether two categorical variables are statistically associated in a contingency table. In machine learning, that helps evaluate feature-label relationships, class imbalance, segmentation differences, and possible distribution drift.
2. When should I use goodness-of-fit?
Use goodness-of-fit when you have one categorical variable and want to compare its observed frequencies against a target distribution. Examples include expected class proportions, sampling fairness checks, or baseline category frequencies in monitoring pipelines.
3. When should I use independence or homogeneity?
Use it when you have two categorical variables organized in a table. Examples include feature bucket versus predicted class, region versus error type, or campaign source versus conversion class. It helps reveal whether the variables appear related.
4. Why are expected counts important?
Expected counts show what frequencies would look like under the null hypothesis. Comparing observed and expected values reveals which cells drive the statistic. Very small expected counts can weaken the approximation and reduce confidence in the reported p-value.
5. What do standardized residuals tell me?
Residuals help locate the cells or categories contributing most strongly to the chi-square statistic. Large positive or negative residuals suggest stronger local deviations between observed and expected frequencies, which is especially useful when debugging model segments or drift patterns.
6. What is Cramer's V or Cohen's w?
These are effect size measures. They describe practical magnitude rather than only statistical significance. A small p-value can occur with large samples even when the relationship is weak, so effect size helps judge whether the pattern is meaningfully important.
7. Why would I estimate parameters in goodness-of-fit?
If expected probabilities are learned from data rather than fully fixed in advance, each fitted parameter reduces the effective degrees of freedom. This adjustment makes the test more honest and prevents overstating evidence against the null hypothesis.
8. Can I use this for machine learning feature screening?
Yes. It is useful for categorical feature relevance checks, label association analysis, fairness slices, monitoring category drift, and comparing grouped outcomes. It works best as an exploratory statistical signal, not as the only model selection criterion.