Calculator
Formula used
This calculator uses the Two One-Sided Tests (TOST) approach for equivalence of mean differences, with a z-approximation. Let Δ be the equivalence margin and SE the standard error.
- Two-sample SE:
SE = SD * sqrt(1/n1 + 1/n2) - One-sample / paired SE:
SE = SD / sqrt(n) - Critical value:
z = Φ⁻¹(1 − α) - Acceptance interval for the estimate:
(−Δ + z·SE, Δ − z·SE) - Power:
P( L < D̂ < U ) = Φ((U−μ)/SE) − Φ((L−μ)/SE)
If the acceptance interval is empty (U ≤ L), equivalence cannot be shown at that precision.
How to use this calculator
- Select a design: two-sample, one-sample, or paired.
- Set the equivalence margin (±Δ) in your outcome units.
- Enter SD and the assumed true difference between methods.
- Choose a mode: power from size, or size from target power.
- Press Submit to view results under the header.
Example data table
Illustrative scenarios to compare planning choices.
| Scenario | Design | α | Δ | SD | μdiff | n1 | n2 | Expected result |
|---|---|---|---|---|---|---|---|---|
| A | Two-sample | 0.05 | 5 | 10 | 0 | 50 | 50 | Moderate power for equivalence |
| B | Two-sample | 0.05 | 5 | 10 | 1 | 80 | 80 | Higher power if true diff is small |
| C | Paired | 0.05 | 3 | 6 | 0 | 60 | 60 | Often strong power due to pairing |
FAQs
1) What is an equivalence test power calculation?
It estimates the probability your study will conclude two methods are practically equivalent, given a margin, variability, and sample sizes under an assumed true difference.
2) Why does the calculator use two one-sided tests?
Equivalence requires showing the effect is above −Δ and below +Δ. TOST checks both bounds at the chosen one-sided alpha, ensuring the entire confidence interval lies within the margin.
3) How should I choose the equivalence margin (Δ)?
Pick the largest difference that is still practically unimportant. Use domain knowledge, stakeholder agreement, historical performance, and measurement error to justify Δ before looking at the data.
4) What does “assumed true difference” mean?
It is the real mean difference you expect between methods. Power is highest when the true difference is near zero and decreases as it approaches the equivalence bounds.
5) Is this exact for small samples?
This tool uses a z-approximation for speed and clarity. For very small samples or strongly non-normal outcomes, a t-based or simulation approach can be more accurate.
6) How do paired designs affect power?
Paired designs often reduce variability because each subject serves as their own control. Enter the SD of paired differences to reflect that reduction, which can substantially increase power.
7) What does the acceptance interval in results represent?
It is the range of observed mean differences that would pass both one-sided tests at alpha. If that interval is empty, the margin is too tight for the chosen precision.
8) How should I use the dropout rate?
Dropout inflates planned sample sizes so your final analyzed sample still meets power goals. If you expect 10% dropout, the calculator increases n to target the needed retained sample.