Calculator Inputs
Enter the cross-classified ratings from two raters. The tool supports unweighted, linear weighted, and quadratic weighted kappa.
Example Data Table
This sample matrix illustrates how two raters classified the same set of cases into three ordered groups.
| Rater A \ Rater B | Low | Medium | High | Row Total |
|---|---|---|---|---|
| Low | 18 | 2 | 1 | 21 |
| Medium | 3 | 15 | 2 | 20 |
| High | 1 | 2 | 16 | 19 |
| Column Total | 22 | 19 | 19 | 60 |
Formula Used
Unweighted kappa
Observed agreement: Po = Σ pii
Expected agreement: Pe = Σ pi+ p+i
Kappa: κ = (Po − Pe) / (1 − Pe)
Weighted kappa
Weighted observed agreement: Po,w = ΣΣ wij pij
Weighted expected agreement: Pe,w = ΣΣ wij pi+ p+j
Weighted kappa: κw = (Po,w − Pe,w) / (1 − Pe,w)
Approximate uncertainty
SE ≈ √[ Po(1 − Po) / {N(1 − Pe)²} ]
z = κ / SE
Confidence interval = κ ± zcritical × SE
Linear weighting reduces the penalty for near misses. Quadratic weighting reduces it even more for adjacent categories.
How to Use This Calculator
- Choose how many rating categories your study contains.
- Rename the category labels to match your coding scheme.
- Enter the count of cases for every rater combination.
- Select unweighted, linear weighted, or quadratic weighted analysis.
- Pick a confidence level and your preferred decimal precision.
- Press the calculate button to show results above the form.
- Review kappa, exact agreement, expected agreement, interval estimates, and matrix summaries.
- Use the export buttons to save a CSV or PDF version.
FAQs
1) What does kappa agreement measure?
Kappa measures how much two raters agree after removing agreement expected by chance. It is useful for categorical ratings such as diagnoses, labels, inspection outcomes, or survey coding decisions.
2) When should I use weighted kappa?
Use weighted kappa when your categories are ordered, such as low, medium, and high. Weighted methods give partial credit to near agreements and are better for ordinal scales.
3) What is the difference between exact and weighted agreement?
Exact agreement counts only perfect diagonal matches. Weighted agreement also considers how far off disagreements are. Adjacent categories receive less penalty than distant categories under weighted schemes.
4) How should I interpret the kappa value?
A common guide is: below 0 poor, 0 to 0.20 slight, 0.21 to 0.40 fair, 0.41 to 0.60 moderate, 0.61 to 0.80 substantial, and above 0.80 almost perfect agreement.
5) Can kappa be negative?
Yes. A negative kappa suggests the raters agree less than expected by chance. This may indicate reversed coding, inconsistent category definitions, or serious data-entry problems.
6) Why can high raw agreement still produce modest kappa?
Kappa adjusts for chance agreement. If one category dominates the data, expected agreement becomes large, which can reduce kappa even when the diagonal percentage looks high.
7) Does this calculator work for more than two raters?
No. This page is for two-rater Cohen-style kappa using a contingency matrix. For three or more raters, consider Fleiss’ kappa or another multi-rater reliability method.
8) What should I export for reporting?
Report the weighting type, sample size, category labels, observed matrix, kappa, confidence interval, and interpretation. The CSV and PDF exports help preserve those values for audits, papers, and team reviews.