Weighted Kappa Calculator

Calculator Input

Number of categories

Weighting scheme

Category Labels

Label 1

Label 2

Label 3

Label 4

Label 5

Contingency Table

Rows represent Rater A. Columns represent Rater B. Enter paired counts for each category combination.

Rater A \ Rater B	Category 1	Category 2	Category 3	Category 4	Category 5
Category 1
Category 2
Category 3
Category 4
Category 5

Example Data Table

Rater A \ Rater B	Defect Free	Minor Issue	Major Issue	Critical Issue
Defect Free	12	2	1	0
Minor Issue	2	11	2	1
Major Issue	1	2	10	2
Critical Issue	0	1	2	13

This sample table shows four ordinal inspection ratings scored by two reviewers during a quality control audit.

Formula Used

Weighted kappa formula:

κw = (Po - Pe) / (1 - Pe)

Observed weighted agreement: Po = ΣΣ wij × pij

Expected weighted agreement: Pe = ΣΣ wij × pi+ × p+j

Linear weights: wij = 1 - |i - j| / (k - 1)

Quadratic weights: wij = 1 - (|i - j| / (k - 1))²

Here, wij is the weight between categories, pij is the observed proportion, and the marginal proportions are derived from row and column totals.

How to Use This Calculator

Select the number of rating categories used by both reviewers.
Choose linear weights for steady penalties or quadratic weights for stronger penalties on larger disagreements.
Rename category labels to match your inspection, audit, or scoring process.
Enter the paired count for every Rater A and Rater B category combination.
Click the calculate button to display weighted agreement, expected agreement, kappa, and interpretation.
Use the CSV button to save inputs and results, or print to PDF for reporting.

Frequently Asked Questions

1. What does weighted kappa measure?

Weighted kappa measures agreement between two raters when categories have order. It adjusts for chance agreement and gives partial credit when disagreements are close.

2. When should I use weighted kappa instead of simple kappa?

Use weighted kappa when categories are ordinal, such as severity levels, grades, or inspection ratings. Simple kappa treats every disagreement as equally serious.

3. What is the difference between linear and quadratic weights?

Linear weights reduce agreement evenly as categories get farther apart. Quadratic weights penalize large disagreements more strongly and are often preferred for clinical or quality scoring.

4. Can weighted kappa be negative?

Yes. A negative value means the raters agree less than expected by chance. This can indicate inconsistent criteria, unclear standards, or coding errors.

5. Does high exact agreement always mean high kappa?

No. Kappa also depends on expected agreement from marginal totals. Highly unbalanced category use can lower kappa even when exact agreement looks strong.

6. Can I use this calculator for more than four categories?

Yes. The calculator supports two to seven categories. This covers many audit scales, defect classes, performance levels, and risk severity frameworks.

7. What type of data should go into the table?

Enter paired counts, not percentages. Each cell should represent how many items both raters assigned to that category combination during the same review set.

8. Is weighted kappa suitable for quality control?

Yes. It is useful for audits, defect classification, inspection scoring, and compliance reviews where disagreement severity matters rather than only exact matching.