Calculator Inputs
Example Data Table
This example tests whether five colors appear equally often in a sample of 100 items.
| Category | Observed count | Expected proportion | Expected count |
|---|---|---|---|
| Red | 18 | 0.20 | 20 |
| Blue | 22 | 0.20 | 20 |
| Green | 20 | 0.20 | 20 |
| Yellow | 17 | 0.20 | 20 |
| Purple | 23 | 0.20 | 20 |
Formula Used
The goodness of fit statistic is:
χ² = Σ ((Oᵢ - Eᵢ)² / Eᵢ)
Oᵢ is the observed count. Eᵢ is the expected count.
df = k - 1 - m
k is the number of categories. m is the number of estimated parameters.
p value = P(Χ²df ≥ χ² observed)
Standardized residual = (Oᵢ - Eᵢ) / √Eᵢ
Cohen's w = √(χ² / N)
How to Use This Calculator
- Enter each category name on a separate line.
- Enter observed counts in the same category order.
- Choose the expected value type.
- Enter expected counts, proportions, percentages, or choose equal categories.
- Set alpha, estimated parameters, and rounding.
- Press Calculate to see the result above the form.
- Use the CSV or PDF button to save the results.
Understanding Goodness of Fit
A chi square goodness of fit test checks one categorical variable. It compares observed counts with expected counts. The goal is simple. It asks whether the sample pattern looks close to a stated model.
This calculator helps when categories are fixed before data collection. Common cases include dice faces, survey choices, color ratios, quality grades, or arrival counts. You enter each observed count. You also enter expected counts, proportions, percentages, or equal expectations.
What the Result Means
The test statistic grows when differences are large. It also grows when expected counts are small. A small statistic means the observed pattern is near the expected pattern. A large statistic suggests the model may not fit.
The p value measures tail evidence under the null hypothesis. A small p value means the observed gaps would be unusual if the expected model were true. The alpha level sets the decision rule. If the p value is less than alpha, reject the null model.
Checking Assumptions
Goodness of fit needs count data. Categories should not overlap. Each observation should belong to one category only. The observations should be independent. Expected counts should usually be at least five. When expected counts are too small, combine sensible categories or collect more data.
Use estimated parameters carefully. If expected values were built from sample estimates, reduce degrees of freedom. This calculator includes a field for estimated parameters. It subtracts that value from the standard category count.
Why Residuals Matter
The overall statistic tells whether the pattern fits. Residuals show where the mismatch occurs. A positive residual means the observed count is above expectation. A negative residual means it is below expectation. Large absolute residuals deserve attention.
Contribution percentages show which categories drive the test. They are useful for reports. They also prevent vague conclusions. Instead of saying the model failed, you can identify the strongest categories.
Practical Use
Do not treat significance as practical importance. With a huge sample, tiny differences can become significant. With a small sample, real differences can be missed. Review effect size with the p value. Cohen’s w gives a simple scale. Larger values show a stronger overall departure.
Document both statistical evidence and practical context before final reporting.
FAQs
What is a chi square goodness of fit test?
It is a test for one categorical variable. It compares observed counts with expected counts from a model, theory, or claimed distribution.
When should I use this calculator?
Use it when your data are counts in categories and you have expected counts, proportions, percentages, or equal category assumptions.
What does a small p value mean?
A small p value suggests the observed category pattern is unlikely under the expected model. It supports rejecting the null hypothesis.
What are degrees of freedom here?
Degrees of freedom equal categories minus one, minus estimated parameters. Parameters are subtracted when expected values use sample estimates.
Why are expected counts important?
The test relies on expected counts. Small expected counts can make the approximation weak, so combining categories may be needed.
Can expected percentages be used?
Yes. Choose percentages as the expected type. The calculator converts them into expected counts using the observed total.
What do standardized residuals show?
They show which categories are above or below expectation after scaling by expected count. Larger absolute values show stronger mismatch.
Does this prove the model is true?
No. A non-significant result only means the data do not show strong evidence against the expected model at the selected alpha.