A/B Test Sample Size Calculator

Calculator

Choose your metric type, set your confidence and sensitivity, then calculate required sample sizes for A and B.

Metric type

Binary for conversions; continuous for averages like revenue.

Significance (alpha)

Common values: 0.10, 0.05, 0.01.

Power

Common values: 0.80 or 0.90.

Test type

Two-sided is the default for most experiments.

Allocation ratio (B/A)

Use 1 for equal traffic; 2 means B gets double.

Drop-off rate

Fraction lost to tracking gaps (e.g., 0.08).

Binary inputs

Conversion rate

Baseline conversion rate (A)

Example: 0.10 for 10%.

MDE type

Relative is percent change; absolute is rate points.

Minimum detectable effect

Relative: 0.10 means +10%. Absolute: 0.01 means +1 point.

Continuous inputs

Means

Standard deviation (sigma)

Use historical variance for the chosen metric.

Detectable difference (delta)

Minimum meaningful change in original units.

Traffic per day (optional)

Used to estimate test duration.

Traffic per day (optional)

Total eligible users per day across both variants.

Example data table

Use this sample input set to validate your setup.

Metric Type	Baseline / Sigma	MDE / Delta	Alpha	Power	Ratio	Drop-off
Conversion rate	0.10	Relative 0.10	0.05	0.80	1.0	0.00
Continuous	Sigma 1.00	Delta 0.10	0.05	0.90	1.0	0.05

Formula used

Binary conversion (two proportions):

nA = [ zα·√(p̄(1−p̄)(1+1/r)) + zβ·√(p1(1−p1) + p2(1−p2)/r) ]² / (p2−p1)²

Where p1 is baseline, p2 is target, p̄=(p1+p2)/2, r=nB/nA, zα uses alpha (two-sided uses alpha/2), and zβ uses the chosen power.

Continuous mean difference (equal variance):

nA = ( (zα+zβ)² · σ² · (1+1/r) ) / δ² , nB = r·nA

Finally, adjust for drop-off: n_adj = n / (1 − dropOff).

How to use this calculator

Select conversion or continuous based on your primary metric.
Set alpha and power to match your risk tolerance.
Enter the smallest change worth detecting (MDE or delta).
Choose an allocation ratio if traffic is not split equally.
Add a drop-off rate to protect against missing data.
Click Calculate to see sample sizes above the form.
Use the CSV/PDF buttons to export results for sharing.
Re-run for alternate scenarios to compare feasibility.

FAQs

1) What does alpha control?

Alpha is the false-positive risk. Lower alpha reduces accidental wins but increases required sample size. Many teams use 0.05 for routine tests.

2) What does power mean?

Power is the chance you detect a real effect of the chosen size. Higher power reduces missed wins, but it requires more participants.

3) Should I use one-sided or two-sided tests?

Two-sided tests are safer when outcomes can move either direction. One-sided tests make sense only if you will never act on a negative change.

4) What is MDE and how do I choose it?

MDE is the smallest improvement you care about. Pick it from business impact, cost, and feasibility. Smaller MDE values require much larger samples.

5) Why does unequal traffic split increase sample size?

With unequal allocation, the smaller group becomes noisy. The formula accounts for this through the ratio term, which generally increases total users needed.

6) How do I estimate sigma for continuous metrics?

Use historical data for the same population and measurement window. Compute the standard deviation of that metric, then choose a meaningful delta.

7) What should I do if my baseline rate is very low?

Low baselines can inflate sample needs, especially for tiny uplifts. Consider longer test windows, broader eligibility, or a different primary metric.

8) Does this handle multiple comparisons?

Not directly. If you run many variants or metrics, adjust your alpha using a correction method or a pre-registered analysis plan.