Advanced Chi-Square Test Calculator for AI & Machine Learning

Result Summary

Run the calculator to show chi-square results, p-values, effect size, assumption checks, tables, CSV export, PDF export, and Plotly charts here.

Calculator Input

This page stays in a single vertical flow, while the calculator fields use a responsive 3-column, 2-column, and 1-column layout.

Test Type Significance Level (Alpha)

Apply Yates correction for 2x2 tables

Use independence for contingency tables. Use goodness-of-fit for one categorical distribution.

Number of Rows Number of Columns Row Labels

Use commas or new lines.

Column Labels

Use commas or new lines.

Observed Matrix

Enter each row on a new line, or provide all values in one sequence.

Number of Categories Estimated Parameters Expected Input Type

Category Labels

Use commas or new lines.

Observed Counts

Example: 18, 25, 31, 26

Expected Probabilities or Counts

The calculator normalizes probabilities, percentages, or weights automatically.

Example Data Table

This sample contingency table shows whether a categorical feature distribution differs across predicted classes.

Feature Group	Class A	Class B	Class C	Row Total
Feature Low	25	30	20	75
Feature Medium	15	18	27	60
Feature High	10	22	33	65
Column Total	50	70	80	200

Formula Used

Core chi-square statistic
χ² = Σ ((O - E)² / E)

Expected count for independence tables
E_ij = (Row Total_i × Column Total_j) / Grand Total

Degrees of freedom
Independence / Homogeneity: df = (r - 1)(c - 1)
Goodness-of-Fit: df = k - 1 - m

Effect size
Cramer's V = √(χ² / (n × min(r - 1, c - 1)))
Phi for 2x2 tables = √(χ² / n)
Cohen's w for goodness-of-fit = √(χ² / n)

Where O is observed frequency, E is expected frequency, n is total sample size, r is row count, c is column count, k is category count, and m is the number of fitted parameters.

How to Use This Calculator

Select the appropriate test type.
Enter alpha, dimensions, and labels.
Paste observed counts into the correct input field.
For goodness-of-fit, enter expected probabilities, counts, or weights.
Submit the form to see the result above the calculator.
Review chi-square, p-value, critical value, residuals, and effect size.
Inspect the charts to identify influential categories or cells.
Use the CSV or PDF buttons to save your report.

FAQs

1. What does this calculator test?

It tests whether observed category counts differ from expected counts, or whether two categorical variables are statistically associated in a contingency table. In machine learning, that helps evaluate feature-label relationships, class imbalance, segmentation differences, and possible distribution drift.

2. When should I use goodness-of-fit?

Use goodness-of-fit when you have one categorical variable and want to compare its observed frequencies against a target distribution. Examples include expected class proportions, sampling fairness checks, or baseline category frequencies in monitoring pipelines.

3. When should I use independence or homogeneity?

Use it when you have two categorical variables organized in a table. Examples include feature bucket versus predicted class, region versus error type, or campaign source versus conversion class. It helps reveal whether the variables appear related.

4. Why are expected counts important?

Expected counts show what frequencies would look like under the null hypothesis. Comparing observed and expected values reveals which cells drive the statistic. Very small expected counts can weaken the approximation and reduce confidence in the reported p-value.

5. What do standardized residuals tell me?

Residuals help locate the cells or categories contributing most strongly to the chi-square statistic. Large positive or negative residuals suggest stronger local deviations between observed and expected frequencies, which is especially useful when debugging model segments or drift patterns.

6. What is Cramer's V or Cohen's w?

These are effect size measures. They describe practical magnitude rather than only statistical significance. A small p-value can occur with large samples even when the relationship is weak, so effect size helps judge whether the pattern is meaningfully important.

7. Why would I estimate parameters in goodness-of-fit?

If expected probabilities are learned from data rather than fully fixed in advance, each fitted parameter reduces the effective degrees of freedom. This adjustment makes the test more honest and prevents overstating evidence against the null hypothesis.

8. Can I use this for machine learning feature screening?

Yes. It is useful for categorical feature relevance checks, label association analysis, fairness slices, monitoring category drift, and comparing grouped outcomes. It works best as an exploratory statistical signal, not as the only model selection criterion.

Result Summary

Calculator Input

Example Data Table

Formula Used

How to Use This Calculator

FAQs

1. What does this calculator test?

2. When should I use goodness-of-fit?

3. When should I use independence or homogeneity?

4. Why are expected counts important?

5. What do standardized residuals tell me?

6. What is Cramer's V or Cohen's w?

7. Why would I estimate parameters in goodness-of-fit?

8. Can I use this for machine learning feature screening?

Related Calculators