Advanced Kappa Statistic Calculator

Kappa statistic input form

Enter labels and counts for a square agreement table. Rows represent Rater A. Columns represent Rater B.

Number of categories

Weighting method

Category label 1

Category label 2

Category label 3

Rater A \ Rater B	Category 1	Category 2	Category 3
Category 1
Category 2
Category 3

Example data table

This sample shows three response categories from two raters evaluating the same 100 items.

Rater A \ Rater B	Negative	Neutral	Positive	Row total
Negative	35	3	2	40
Neutral	4	28	3	35
Positive	1	5	19	25
Column total	40	36	24	100

For this example, exact agreement is 82%, expected agreement is 34.6%, and unweighted kappa is approximately 0.7248, indicating substantial agreement.

Formula used

Kappa compares observed agreement with the agreement expected by chance from the marginal totals.

Observed agreement: P_o = (sum of diagonal counts) / N Expected agreement: P_e = sum[(row total_i / N) × (column total_i / N)] Cohen's kappa: κ = (P_o - P_e) / (1 - P_e) Approximate standard error: SE ≈ sqrt( P_o(1-P_o) / [N(1-P_e)^2] ) 95% confidence interval: κ ± 1.96 × SE

For weighted kappa, the calculator replaces exact agreement with a weighted agreement score:

Weighted observed agreement: P_o(w) = sum[w_ij × p_ij] Weighted expected agreement: P_e(w) = sum[w_ij × p_i+ × p_+j] Weighted kappa: κ_w = (P_o(w) - P_e(w)) / (1 - P_e(w))

Linear and quadratic weights give partial credit to near agreements. Use weighted methods when categories follow a logical order.

How to use this calculator

Select the number of categories in your rating system.
Enter a label for each category.
Fill the square table with counts from the two raters.
Choose unweighted, linear weighted, or quadratic weighted kappa.
Click Calculate Kappa to view agreement metrics and graphs.
Review the interpretation, confidence interval, and matrix totals.
Use CSV or PDF export to save the analysis.

Frequently asked questions

1. What does the kappa statistic measure?

Kappa measures how much two raters agree after removing the agreement expected by chance. It works best for categorical ratings and helps assess reliability beyond simple percent agreement.

2. When should I use weighted kappa?

Use weighted kappa when categories are ordered, such as severity levels or satisfaction scores. It gives partial credit when raters are close but not identical, making it more informative than unweighted kappa for ordinal data.

3. Why can percent agreement be high while kappa is moderate?

Percent agreement ignores chance agreement. When one category is very common, raters may agree often simply because both choose it frequently. Kappa adjusts for this imbalance and can therefore be lower.

4. What range can kappa take?

Kappa can range from -1 to 1. A value near 1 suggests very strong agreement, 0 suggests chance-level agreement, and negative values suggest systematic disagreement.

5. Does this calculator support more than two categories?

Yes. You can choose between two and six categories and enter a full square confusion matrix. That makes it suitable for many practical coding, labeling, diagnostic, or review workflows.

6. How should I interpret the confidence interval?

The interval shows a likely range for the true kappa under repeated sampling. Narrow intervals suggest more precision. This calculator reports an approximate large-sample confidence interval.

7. Can I use this for machine learning classification review?

Yes. It is useful for comparing human annotators, model reviewers, or two coding systems. The matrix format matches many classification and labeling review tasks.

8. What data should I enter in the matrix?

Enter counts of items assigned to each category combination. Each cell should represent how many items Rater A placed in one category while Rater B placed the same items in another category.