Analyze one-sample, paired, and independent equivalence tests. Set margins, confidence levels, and assumptions before calculating. Review results, charts, exports, formulas, examples, and guidance instantly.
This example illustrates paired observations and their within-subject differences. You can adapt the same idea for one-sample or two-group studies.
| Dataset | Value A | Value B | Difference |
|---|---|---|---|
| Pair 1 | 12.2 | 12.0 | 0.2 |
| Pair 2 | 11.8 | 11.9 | -0.1 |
| Pair 3 | 12.4 | 12.3 | 0.1 |
| Pair 4 | 12.1 | 12.2 | -0.1 |
| Pair 5 | 11.9 | 12.0 | -0.1 |
Equivalence testing usually applies the two one-sided tests procedure, called TOST. Instead of checking whether the effect equals zero, it checks whether the effect stays between a lower and upper practical bound.
Effect estimate: For one-sample testing, estimate = sample mean − reference mean. For two independent groups, estimate = mean1 − mean2. For paired data, estimate = mean paired difference.
Lower one-sided test: tL = (estimate − lower bound) / SE
Upper one-sided test: tU = (estimate − upper bound) / SE
Decision rule: Conclude equivalence when both one-sided p-values are below alpha. The corresponding confidence interval must also lie entirely within the equivalence bounds.
Standard error examples: One-sample SE = s / √n. Paired SE = sd / √n. Independent groups use either pooled or Welch standard errors, depending on the chosen assumption.
An equivalence test checks whether an effect is small enough to be practically unimportant. It does not test for a zero difference exactly. Instead, it evaluates whether the estimate falls between predefined lower and upper similarity bounds.
TOST stands for two one-sided tests. One test checks whether the effect is above the lower bound. The second checks whether it is below the upper bound. Both must pass to conclude equivalence.
Margins should come from subject-matter reasoning, clinical relevance, engineering tolerance, or established domain guidance. They should represent the largest acceptable difference that still counts as practically similar.
Use Welch when group spreads or sample sizes differ noticeably. It is more robust when equal variances are uncertain. Use pooled variance only when the equal-variance assumption is justified by design or diagnostics.
No. A non-significant difference test only says evidence for a difference was insufficient. Equivalence requires evidence that the true effect is small enough to stay inside your chosen practical bounds.
The confidence interval gives a visual summary of uncertainty. If the entire interval lies inside the equivalence range, it supports the same conclusion as passing both one-sided tests at the matching alpha level.
Yes. This page is designed for summary inputs such as means, standard deviations, sample sizes, and paired-difference statistics. Raw data are not required for the core TOST calculations shown here.
If both one-sided tests are significant, the result supports practical equivalence within your selected margins. If not, the evidence is insufficient to conclude equivalence under the current assumptions and inputs.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.