Chi Square Test for Goodness of Fit Calculator

Calculator Inputs

Category names

Observed counts

Expected values

Expected value type

Significance level alpha

Estimated parameters

Expected count handling

Decimal places

Example Data Table

This example tests whether five colors appear equally often in a sample of 100 items.

Category	Observed count	Expected proportion	Expected count
Red	18	0.20	20
Blue	22	0.20	20
Green	20	0.20	20
Yellow	17	0.20	20
Purple	23	0.20	20

Formula Used

The goodness of fit statistic is:

χ² = Σ ((Oᵢ - Eᵢ)² / Eᵢ)

Oᵢ is the observed count. Eᵢ is the expected count.

df = k - 1 - m

k is the number of categories. m is the number of estimated parameters.

p value = P(Χ²df ≥ χ² observed)

Standardized residual = (Oᵢ - Eᵢ) / √Eᵢ

Cohen's w = √(χ² / N)

How to Use This Calculator

Enter each category name on a separate line.
Enter observed counts in the same category order.
Choose the expected value type.
Enter expected counts, proportions, percentages, or choose equal categories.
Set alpha, estimated parameters, and rounding.
Press Calculate to see the result above the form.
Use the CSV or PDF button to save the results.

Understanding Goodness of Fit

A chi square goodness of fit test checks one categorical variable. It compares observed counts with expected counts. The goal is simple. It asks whether the sample pattern looks close to a stated model.

This calculator helps when categories are fixed before data collection. Common cases include dice faces, survey choices, color ratios, quality grades, or arrival counts. You enter each observed count. You also enter expected counts, proportions, percentages, or equal expectations.

What the Result Means

The test statistic grows when differences are large. It also grows when expected counts are small. A small statistic means the observed pattern is near the expected pattern. A large statistic suggests the model may not fit.

The p value measures tail evidence under the null hypothesis. A small p value means the observed gaps would be unusual if the expected model were true. The alpha level sets the decision rule. If the p value is less than alpha, reject the null model.

Checking Assumptions

Goodness of fit needs count data. Categories should not overlap. Each observation should belong to one category only. The observations should be independent. Expected counts should usually be at least five. When expected counts are too small, combine sensible categories or collect more data.

Use estimated parameters carefully. If expected values were built from sample estimates, reduce degrees of freedom. This calculator includes a field for estimated parameters. It subtracts that value from the standard category count.

Why Residuals Matter

The overall statistic tells whether the pattern fits. Residuals show where the mismatch occurs. A positive residual means the observed count is above expectation. A negative residual means it is below expectation. Large absolute residuals deserve attention.

Contribution percentages show which categories drive the test. They are useful for reports. They also prevent vague conclusions. Instead of saying the model failed, you can identify the strongest categories.

Practical Use

Do not treat significance as practical importance. With a huge sample, tiny differences can become significant. With a small sample, real differences can be missed. Review effect size with the p value. Cohen’s w gives a simple scale. Larger values show a stronger overall departure.

Document both statistical evidence and practical context before final reporting.

FAQs

What is a chi square goodness of fit test?

It is a test for one categorical variable. It compares observed counts with expected counts from a model, theory, or claimed distribution.

When should I use this calculator?

Use it when your data are counts in categories and you have expected counts, proportions, percentages, or equal category assumptions.

What does a small p value mean?

A small p value suggests the observed category pattern is unlikely under the expected model. It supports rejecting the null hypothesis.

What are degrees of freedom here?

Degrees of freedom equal categories minus one, minus estimated parameters. Parameters are subtracted when expected values use sample estimates.

Why are expected counts important?

The test relies on expected counts. Small expected counts can make the approximation weak, so combining categories may be needed.

Can expected percentages be used?

Yes. Choose percentages as the expected type. The calculator converts them into expected counts using the observed total.

What do standardized residuals show?

They show which categories are above or below expectation after scaling by expected count. Larger absolute values show stronger mismatch.

Does this prove the model is true?

No. A non-significant result only means the data do not show strong evidence against the expected model at the selected alpha.