Check consistency between first and second measurements quickly. See reliability, bias, and variability in seconds. Use formulas, exports, and graphs for confident review today.
Paste comma, space, or line-separated values. Each test value must match one retest value by position.
This sample shows how the two sessions should align by row.
| Participant | Test Score | Retest Score | Difference |
|---|---|---|---|
| 1 | 62 | 64 | 2 |
| 2 | 70 | 72 | 2 |
| 3 | 58 | 57 | -1 |
| 4 | 91 | 90 | -1 |
| 5 | 76 | 78 | 2 |
| 6 | 84 | 85 | 1 |
Here, Xi is the test score, Yi is the retest score, X̄ is the test mean, and Ȳ is the retest mean. Higher positive values indicate stronger score stability across sessions.
SEM estimates expected measurement noise using the first session standard deviation and the calculated reliability coefficient.
TEM summarizes the usual random spread between repeated scores. Lower values indicate tighter repeat consistency.
The calculator applies the Fisher z transformation to estimate the confidence interval around the Pearson correlation.
Enter the name of your scale if you want labeled exports. Paste the first session scores into the test field. Paste the second session scores into the retest field in the same row order. Choose decimal places, then click Calculate Reliability.
Review the summary shown above the form. Use Pearson correlation for the main stability estimate. Check the confidence interval to judge precision. Compare mean difference, SEM, and TEM to understand bias and random error. Export results with CSV or PDF when needed.
It measures how stable scores remain when the same people complete the same measure on two different occasions. High stability suggests consistent measurement over time.
Pearson correlation is the main coefficient for continuous paired scores. Spearman correlation helps when rank order matters more or when data are less normally distributed.
Interpretation depends on context, but values below 0.50 are often weak, 0.50 to 0.74 moderate, 0.75 to 0.89 good, and 0.90 or higher excellent.
Similar averages do not guarantee stable individual rankings. Reliability depends on how consistently each person’s score tracks across both sessions.
SEM estimates expected score uncertainty caused by measurement error. Lower SEM values usually indicate more dependable observed scores.
Review it when you want to detect systematic bias. A large positive or negative mean difference can suggest learning effects, fatigue, or scoring drift.
Yes. Every test score must pair with exactly one retest score. Unequal list lengths break the paired comparison and invalidate the calculation.
Yes. It works well for any paired continuous scores, including test totals, scale scores, repeated measurements, and many laboratory or performance outcomes.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.