AB Test Power Calculator

Calculator inputs

Baseline conversion rate (%)

Variant conversion rate (%)

Significance level alpha (%)

Visitors in group A

Visitors in group B

Target power (%)

Hypothesis type

Traffic ratio (B/A)

Example data table

Scenario	Baseline	Variant	Visitors A	Visitors B	Alpha	Target Power
Homepage signup	12.00%	13.80%	25,000	25,000	5%	80%
Checkout completion	42.00%	44.10%	12,000	12,000	5%	90%
Email clickthrough	3.20%	3.90%	60,000	60,000	1%	80%
Pricing page trial start	7.50%	8.20%	40,000	20,000	5%	85%

Formula used

Observed lift: Δ = p₂ − p₁

Null standard error: SE₀ = √[p̄(1−p̄)(1/n₁ + 1/n₂)]

Alternative standard error: SE₁ = √[p₁(1−p₁)/n₁ + p₂(1−p₂)/n₂]

Z statistic: Z = |Δ| / SE₀

Approximate power: Φ(|Δ|/SE₁ − Z_crit·SE₀/SE₁)

Required sample size: n ≈ ((Z_α·variance term + Z_β·effect variance term) / Δ)²

This calculator uses a two-sample normal approximation for conversion rates. It is best for binary outcomes, practical planning, and large enough sample sizes.

How to use this calculator

Enter the current baseline conversion rate for your control group.
Enter the expected or observed variant conversion rate.
Add current visitors for both groups.
Choose alpha and your desired statistical power.
Select one-sided or two-sided testing.
Review the estimated power, p-value, required samples, and minimal detectable effect.
Use the CSV button to export numeric results.
Use the PDF button to save a clean summary.

FAQs

What does statistical power mean in AB testing?

Power is the chance your test detects a real difference when it exists. Higher power reduces the risk of missing a meaningful uplift.

Why is 80% power commonly used?

Eighty percent is a practical balance between sensitivity and traffic cost. It means you accept a 20% chance of missing the effect size you planned for.

What is the minimal detectable effect?

The minimal detectable effect is the smallest lift your test can reliably detect at the chosen alpha and power with the current sample sizes.

When should I use a one-sided test?

Use a one-sided test when only improvement matters and a decrease would never trigger the same decision. Otherwise, two-sided testing is usually safer.

Does unequal traffic change required sample size?

Yes. Unequal traffic generally increases the total sample needed because balanced groups usually produce the most efficient estimate for the same total visitors.

Can I use this for revenue metrics?

This version is designed for binary conversion outcomes. Revenue or continuous outcomes need different variance assumptions and usually a different power formula.

Why can p-value and power tell different stories?

P-value measures evidence in the observed data. Power measures the design strength before or around the effect size. Both answer different planning questions.

What sample sizes make the approximation reliable?

The normal approximation works better when both groups have enough successes and failures. Very small samples or rare events may need exact methods.