Calculator
Choose your metric type, set your confidence and sensitivity, then calculate required sample sizes for A and B.
Example data table
Use this sample input set to validate your setup.
| Metric Type | Baseline / Sigma | MDE / Delta | Alpha | Power | Ratio | Drop-off |
|---|---|---|---|---|---|---|
| Conversion rate | 0.10 | Relative 0.10 | 0.05 | 0.80 | 1.0 | 0.00 |
| Continuous | Sigma 1.00 | Delta 0.10 | 0.05 | 0.90 | 1.0 | 0.05 |
Formula used
Binary conversion (two proportions):
nA = [ zα·√(p̄(1−p̄)(1+1/r)) + zβ·√(p1(1−p1) + p2(1−p2)/r) ]² / (p2−p1)²
Where p1 is baseline, p2 is target, p̄=(p1+p2)/2, r=nB/nA, zα uses alpha (two-sided uses alpha/2), and zβ uses the chosen power.
Continuous mean difference (equal variance):
nA = ( (zα+zβ)² · σ² · (1+1/r) ) / δ² , nB = r·nA
Finally, adjust for drop-off: n_adj = n / (1 − dropOff).
How to use this calculator
- Select conversion or continuous based on your primary metric.
- Set alpha and power to match your risk tolerance.
- Enter the smallest change worth detecting (MDE or delta).
- Choose an allocation ratio if traffic is not split equally.
- Add a drop-off rate to protect against missing data.
- Click Calculate to see sample sizes above the form.
- Use the CSV/PDF buttons to export results for sharing.
- Re-run for alternate scenarios to compare feasibility.
FAQs
1) What does alpha control?
Alpha is the false-positive risk. Lower alpha reduces accidental wins but increases required sample size. Many teams use 0.05 for routine tests.
2) What does power mean?
Power is the chance you detect a real effect of the chosen size. Higher power reduces missed wins, but it requires more participants.
3) Should I use one-sided or two-sided tests?
Two-sided tests are safer when outcomes can move either direction. One-sided tests make sense only if you will never act on a negative change.
4) What is MDE and how do I choose it?
MDE is the smallest improvement you care about. Pick it from business impact, cost, and feasibility. Smaller MDE values require much larger samples.
5) Why does unequal traffic split increase sample size?
With unequal allocation, the smaller group becomes noisy. The formula accounts for this through the ratio term, which generally increases total users needed.
6) How do I estimate sigma for continuous metrics?
Use historical data for the same population and measurement window. Compute the standard deviation of that metric, then choose a meaningful delta.
7) What should I do if my baseline rate is very low?
Low baselines can inflate sample needs, especially for tiny uplifts. Consider longer test windows, broader eligibility, or a different primary metric.
8) Does this handle multiple comparisons?
Not directly. If you run many variants or metrics, adjust your alpha using a correction method or a pre-registered analysis plan.