Kruskal-Wallis calculator
Example data table
| Group | Sample values | n |
|---|---|---|
| Group A | 10, 12, 13, 9, 11 | 5 |
| Group B | 7, 8, 6, 9, 5 | 5 |
| Group C | 14, 15, 16, 13, 12 | 5 |
Formula used
The Kruskal-Wallis test compares k independent groups using ranks. Pool all observations, sort them, and assign ranks (average ranks for ties).
- Let N be total observations, and ni group sizes. Let Ri be the sum of ranks for group i.
-
H statistic:
H = (12 / (N(N+1))) · Σ(Ri² / ni) − 3(N+1) -
Tie correction factor (when repeated values exist):
T = 1 − Σ(t³ − t) / (N³ − N)where t is each tie group size. -
Corrected statistic:
H′ = H / T, withdf = k − 1. The p-value uses the chi-square approximation:p = P(Χ²(df) ≥ H′). -
Effect sizes (rank-based):
ε² = (H′ − k + 1) / (N − k),η² = (H′ − k + 1) / (N − 1).
Notes: the chi-square approximation works best with moderate sample sizes. If groups are very small, consider exact or permutation approaches.
How to use this calculator
- Create at least two groups, then name them.
- Paste numeric values into each group box.
- Keep tie correction enabled for repeated measurements.
- Enable posthoc pairs if you want group comparisons.
- Press Calculate to view results above the form.
- Use CSV or PDF buttons to export your report.
When to choose rank based testing
Kruskal Wallis helps when your outcome is skewed, ordinal, or outliers. It compares independent groups without assuming variances or normality. In practice, researchers use it for survey ratings, reaction times, income, or error counts. It remains valid with different group sizes, yet each group should represent a random, independent sample.
How ranks transform your data
The method replaces raw values with pooled ranks from 1 to N. Lower observations receive smaller ranks, and ties share the average rank. This converts units into relative order, which is why a centimeter scale and a millimeter scale give identical results. The calculator shows mean rank per group so you can see direction.
Reading H, df, and the p value
The test statistic H summarizes how far group rank sums deviate from what random mixing would produce. Degrees of freedom equal k minus 1. The p value uses a chi square approximation, so larger samples improve accuracy. As a rule, aim for at least five observations per group, and interpret results with alpha values like 0.05 or 0.01.
What tie correction changes
Repeated measurements, rounded scores, or discrete scales create ties. Ties reduce rank variability, so uncorrected H can look slightly too large. Tie correction adjusts H by a factor T computed from each tie block size. When many ties exist, the corrected H is usually smaller, and the p value becomes slightly larger, preventing overstatement of significance.
Effect size for practical impact
Statistical significance does not quantify importance. Epsilon squared and eta squared summarize how much of the rank variation is associated with group membership. Values near 0.01 suggest a small shift, around 0.06 a moderate shift, and near 0.14 a large shift, although context matters. Reporting an effect size helps readers compare studies across different sample sizes.
Posthoc comparisons and reporting
If the overall test is significant, you may test pairs using Dunn z statistics based on mean ranks and a pooled variance term. Because multiple comparisons inflate false positives, p value adjustments like Holm or Bonferroni are common. A clean report includes group medians, mean ranks, H, df, p, the adjustment method, and a short conclusion about which groups differ.
FAQs
1) What does the Kruskal Wallis test compare?
It tests whether at least one group tends to have higher or lower values than others by comparing pooled ranks across independent groups.
2) Can I use unequal sample sizes?
Yes. Unequal group sizes are allowed. Each group should be independent, and extremely tiny groups can reduce the reliability of the chi square approximation.
3) Should I enable tie correction?
Usually yes. If your data contain repeated values, tie correction helps keep the p value from being too optimistic.
4) When should I run posthoc comparisons?
Run posthoc Dunn pairs after a significant overall result, or when you have a strong planned comparison. Always use an adjustment to control multiple testing.
5) What effect size should I report?
Report epsilon squared or eta squared alongside H and p. These quantify practical magnitude and help compare results across studies and sample sizes.
6) Does a non significant p mean groups are identical?
No. It means the data did not provide strong evidence of differences at your alpha level. Small samples and high variability can hide real effects.