Advanced Test-Retest Reliability Calculator

Enter Paired Test and Retest Scores

Paste comma, space, or line-separated values. Each test value must match one retest value by position.

Scale or measure name

Optional label for summaries and exports.

Decimal places

Choose how many decimals appear in results.

Confidence level

Displayed interval currently uses standard Fisher z limits.

Test session scores

First measurement values in original order.

Retest session scores

Second measurement values in matching order.

Actions

Results appear above this form after calculation.

Example Data Table

This sample shows how the two sessions should align by row.

Participant	Test Score	Retest Score	Difference
1	62	64	2
2	70	72	2
3	58	57	-1
4	91	90	-1
5	76	78	2
6	84	85	1

Formula Used

Pearson test-retest reliability

r = Σ[(Xi − X̄)(Yi − Ȳ)] / √(Σ(Xi − X̄)² × Σ(Yi − Ȳ)²)

Here, Xi is the test score, Yi is the retest score, X̄ is the test mean, and Ȳ is the retest mean. Higher positive values indicate stronger score stability across sessions.

Standard error of measurement

SEM = SDtest × √(1 − r)

SEM estimates expected measurement noise using the first session standard deviation and the calculated reliability coefficient.

Typical error of measurement

TEM = SDdifference / √2

TEM summarizes the usual random spread between repeated scores. Lower values indicate tighter repeat consistency.

Confidence interval for correlation

z = 0.5 × ln((1 + r) / (1 − r))

CIz = z ± 1.96 / √(n − 3)

rCI = (e^(2z) − 1) / (e^(2z) + 1)

The calculator applies the Fisher z transformation to estimate the confidence interval around the Pearson correlation.

How to Use This Calculator

Enter the name of your scale if you want labeled exports. Paste the first session scores into the test field. Paste the second session scores into the retest field in the same row order. Choose decimal places, then click Calculate Reliability.

Review the summary shown above the form. Use Pearson correlation for the main stability estimate. Check the confidence interval to judge precision. Compare mean difference, SEM, and TEM to understand bias and random error. Export results with CSV or PDF when needed.

Frequently Asked Questions

1. What does test-retest reliability measure?

It measures how stable scores remain when the same people complete the same measure on two different occasions. High stability suggests consistent measurement over time.

2. Which coefficient matters most here?

Pearson correlation is the main coefficient for continuous paired scores. Spearman correlation helps when rank order matters more or when data are less normally distributed.

3. What is considered a good reliability value?

Interpretation depends on context, but values below 0.50 are often weak, 0.50 to 0.74 moderate, 0.75 to 0.89 good, and 0.90 or higher excellent.

4. Why can the means look similar but reliability stay low?

Similar averages do not guarantee stable individual rankings. Reliability depends on how consistently each person’s score tracks across both sessions.

5. What does SEM tell me?

SEM estimates expected score uncertainty caused by measurement error. Lower SEM values usually indicate more dependable observed scores.

6. When should I review the mean difference?

Review it when you want to detect systematic bias. A large positive or negative mean difference can suggest learning effects, fatigue, or scoring drift.

7. Do both score lists need equal lengths?

Yes. Every test score must pair with exactly one retest score. Unequal list lengths break the paired comparison and invalidate the calculation.

8. Can I use this with questionnaire totals or lab values?

Yes. It works well for any paired continuous scores, including test totals, scale scores, repeated measurements, and many laboratory or performance outcomes.