| # | X | Y | Notes |
|---|---|---|---|
| 1 | 1 | 2 | Small increase |
| 2 | 2 | 3 | Continues upward |
| 3 | 3 | 5 | Jump in Y |
| 4 | 4 | 4 | Temporary dip |
| 5 | 5 | 7 | Higher growth |
Pearson correlation measures linear association between X and Y: r = cov(X,Y) / (sX · sY).
- cov(X,Y) = Σ(xi−x̄)(yi−ȳ) / (n−1)
- sX = √(Σ(xi−x̄)² / (n−1)), and similarly for sY.
- R² = r² estimates variance explained for linear patterns.
Spearman replaces values with ranks, then applies the same Pearson formula. Kendall tau-b compares concordant and discordant pairs, adjusting for ties.
For Pearson significance, the calculator uses t = r · √((n−2)/(1−r²)) with df = n−2, then reports a two-tailed p-value.
- Choose a correlation method that matches your goal.
- Paste pairs as x,y lines, or use two lists.
- Set rounding and confidence, then click Calculate.
- Review the coefficient, strength, and plot for patterns.
- Use CSV or PDF to share results with others.
When correlation helps decisions
Correlation summarizes how two numeric variables move together. It is useful for screening relationships before deeper modeling, benchmarking sensors against references, or validating that two scoring methods align. In education, it checks whether practice hours relate to test performance. In operations, it reveals whether throughput changes with staffing. For time series, consider lagged relationships and seasonality, because simultaneous correlation may hide delayed effects and shared trends in the same window. It is not a forecasting model, but it guides where to investigate first.
Direction, strength, and r²
The coefficient ranges from −1 to +1. Positive values indicate both variables tend to increase together, while negative values indicate one rises as the other falls. Magnitude reflects association strength, not impact. Squaring the coefficient gives r², a practical indicator of how much variability a linear pattern may explain. Values near zero suggest little linear association, yet meaningful nonlinear patterns can still exist.
Data quality and outliers
Correlation is sensitive to data quality. A single extreme point can inflate or flip a result, especially with small samples. Before trusting output, review the scatter plot, confirm units, remove duplicates, and justify any exclusions. Check for range restriction, because limited variation depresses correlation. If the relationship is monotonic but curved, rank methods may be more stable. When ties are common, Kendall tau-b offers a conservative alternative.
Significance and confidence
For Pearson correlation, a t test evaluates whether the observed coefficient differs from zero, given sample size. The p value depends heavily on n; tiny effects can become “significant” with large datasets. Confidence intervals provide range estimates, helping you judge practical importance beyond a single number. Wide intervals indicate uncertainty and signal that more data, or better measurement, may be required. Prefer practical thresholds aligned to your domain over generic labels.
Reporting and reproducibility
Professional reporting includes the method, n, the coefficient, and interpretation in context. Add notes about preprocessing, missing values, and any transformations. Exported CSV or PDF summaries support audit trails and team review. When comparing datasets, keep collection windows consistent and document assumptions clearly. If results drive decisions, complement correlation with plots, residual checks, and, when appropriate, controlled experimentation.
What does a negative coefficient mean?
A negative value indicates an inverse association: as X increases, Y tends to decrease. The closer the value is to −1, the more consistently the points follow a downward pattern in the scatter plot.
When should I use Spearman instead of Pearson?
Use Spearman when the relationship is monotonic but not linear, when data contain outliers, or when measurements are ordinal ranks. It reduces sensitivity to extreme values by correlating ranks rather than raw numbers.
Does a high correlation prove causation?
No. Correlation shows association only. Two variables may move together because of a third factor, shared trends, or selection effects. Use domain reasoning, controlled tests, or causal methods to support causation.
Why does my p-value change with sample size?
The test statistic depends on both the coefficient and n. With larger samples, even small coefficients can yield small p-values. Always interpret statistical significance alongside effect size, confidence intervals, and practical impact.
How many data points do I need?
There is no single rule. More points usually produce more stable estimates, especially with noisy data. For Pearson intervals, at least four pairs are needed, and results are more reliable once you have dozens of observations.
What if my data include repeated values or ties?
Ties affect rank calculations. Spearman averages ranks for ties, and Kendall tau-b adjusts for tied pairs explicitly. If ties are frequent, Kendall tau-b can provide a more conservative, interpretable measure.