Intraclass Correlation Calculator

Enter ratings data

Provide a balanced matrix: each row is a target (subject/item) and each column is a rater/measurement. Use CSV, tab, semicolon, pipe, or whitespace separators.

ICC type

Pick a model based on your study design.

Delimiter

Auto-detect uses the first line.

Missing values

This calculator requires complete rows for ANOVA.

Confidence interval alpha

0.05 gives a 95% interval.

Bootstrap resamples (B)

Higher B is more stable but slower.

Enable bootstrap confidence interval

Percentile CI for the selected ICC type.

First row is a header

Use this if your first row contains labels.

First column is an ID

Use this if your first column is target names.

Upload CSV (optional)

If uploaded, it overrides the text box.

Paste matrix data

Example data table

Target	Rater 1	Rater 2	Rater 3
A	4.2	4.4	4.1
B	3.8	3.9	3.7
C	5.1	5.0	5.2
D	4.6	4.7	4.5
E	3.2	3.4	3.3

Use “Load example” to populate the input box with the same table.

Formula used

ICC is computed from analysis-of-variance mean squares. Let n be the number of targets and k the number of raters.

Type	Definition
ICC(1,1)	(MS_B − MS_W) / (MS_B + (k − 1)MS_W) using one-way ANOVA.
ICC(1,k)	(MS_B − MS_W) / MS_B, the average-measure form.
ICC(2,1)	(MS_R − MS_E) / (MS_R + (k − 1)MS_E + k(MS_C − MS_E)/n).
ICC(2,k)	(MS_R − MS_E) / (MS_R + (MS_C − MS_E)/n).
ICC(3,1)	(MS_R − MS_E) / (MS_R + (k − 1)MS_E), consistency with fixed raters.
ICC(3,k)	(MS_R − MS_E) / MS_R, average-measure consistency.

The confidence interval is a bootstrap percentile interval by resampling targets (rows) with replacement.

How to use this calculator

Choose an ICC type that matches your design (one-way vs two-way; agreement vs consistency).
Paste a complete targets-by-raters matrix (or upload a CSV file).
If you included labels, enable “First row is a header” and/or “First column is an ID”.
Press Submit to compute ICC, an ANOVA table, a plot, and confidence intervals.
Use Download CSV/PDF to share results and the cleaned matrix.

Why intraclass correlation is used

Intraclass correlation (ICC) quantifies how much of total score variability comes from true differences between targets rather than rater noise. When multiple clinicians, instruments, or algorithms score the same subjects, ICC summarizes reliability on a 0–1 scale. Values near 1 indicate most variance is between targets.

Selecting the correct ICC model

Model choice depends on whether raters are sampled randomly and whether you care about absolute agreement or consistency. One-way random suits different raters per target. Two-way random suits a random sample of raters applied to all targets. Two-way mixed fits fixed raters, common in audits.

Single versus average measures

Single-measure ICC answers, “How reliable is one rater’s score?” Average-measure ICC answers, “How reliable is the mean of k raters?” Averaging reduces error variance, so ICC(·,k) is typically higher than ICC(·,1). Report the version matching your operational decision.

ANOVA mean squares behind the estimate

This calculator derives ICC from ANOVA mean squares: target mean square (MSR), rater mean square (MSC), and residual error mean square (MSE). These components decompose variability into target effects, systematic rater shifts, and unexplained noise, enabling agreement and consistency formulations.

Confidence intervals and uncertainty

Reliability estimates should include uncertainty. The tool provides a percentile bootstrap interval by resampling targets with replacement. Wider intervals occur with small n, low between-target variance, or inconsistent raters. If the lower bound is near zero, conclusions about reliability are fragile.

Practical reporting guidance

State ICC type, design assumptions, number of targets, and raters. Provide ICC with CI, plus descriptive ranges to show measurement spread. For operational thresholds, many teams treat <0.50 as poor, 0.50–0.75 moderate, 0.75–0.90 good, and >0.90 excellent, but context matters.

Before calculation, verify each target has k numeric ratings and the same raters appear in every row. Large scale differences do not affect ICC, but outliers can inflate within-target error. Consider plotting scores by target and rater to spot drift, ceiling effects, or rater bias. If agreement is required, choose the absolute agreement form; if constant offsets are acceptable, choose consistency. Document any row removals due to missing values and rerun after data cleaning steps.

FAQs

1) What format should my ratings data follow?
Use a targets-by-raters matrix. Each row is one subject or item, each column is one rater. Values must be numeric and complete for every row.

2) Which ICC type should I report?
Match your study design. Use one-way random when raters differ by target. Use two-way random when raters are a random sample applied to all targets. Use mixed when raters are fixed.

3) Why can ICC be negative?
A negative value occurs when within-target variation exceeds between-target variation. It suggests poor reliability, restricted range, or inconsistent raters. Review the plot and consider improving measurement protocols.

4) What is the difference between single and average measures?
Single-measure ICC reflects reliability of one rater’s score. Average-measure ICC reflects reliability of the mean of k raters, which is usually higher because averaging reduces random error.

5) How is the confidence interval computed here?
The interval uses a percentile bootstrap: targets are resampled with replacement, ICC is recomputed B times, and the alpha/2 and 1−alpha/2 quantiles form the bounds.

6) Can I use ICC for categorical or ordinal ratings?
ICC assumes interval-scaled numeric scores. For categorical ratings, consider kappa-type measures. For ordinal ratings, consider weighted kappa or an ordinal mixed model, then report an appropriate reliability metric.