AB Test Power Calculator

Plan stronger experiments with confidence and balanced traffic assumptions. Estimate power, lift, and sample sizes. Turn noisy test ideas into reliable launch decisions today.

Calculator inputs

Example data table

Scenario Baseline Variant Visitors A Visitors B Alpha Target Power
Homepage signup12.00%13.80%25,00025,0005%80%
Checkout completion42.00%44.10%12,00012,0005%90%
Email clickthrough3.20%3.90%60,00060,0001%80%
Pricing page trial start7.50%8.20%40,00020,0005%85%

Formula used

Observed lift: Δ = p₂ − p₁

Null standard error: SE₀ = √[p̄(1−p̄)(1/n₁ + 1/n₂)]

Alternative standard error: SE₁ = √[p₁(1−p₁)/n₁ + p₂(1−p₂)/n₂]

Z statistic: Z = |Δ| / SE₀

Approximate power: Φ(|Δ|/SE₁ − Zcrit·SE₀/SE₁)

Required sample size: n ≈ ((Zα·variance term + Zβ·effect variance term) / Δ)²

This calculator uses a two-sample normal approximation for conversion rates. It is best for binary outcomes, practical planning, and large enough sample sizes.

How to use this calculator

  1. Enter the current baseline conversion rate for your control group.
  2. Enter the expected or observed variant conversion rate.
  3. Add current visitors for both groups.
  4. Choose alpha and your desired statistical power.
  5. Select one-sided or two-sided testing.
  6. Review the estimated power, p-value, required samples, and minimal detectable effect.
  7. Use the CSV button to export numeric results.
  8. Use the PDF button to save a clean summary.

FAQs

What does statistical power mean in AB testing?

Power is the chance your test detects a real difference when it exists. Higher power reduces the risk of missing a meaningful uplift.

Why is 80% power commonly used?

Eighty percent is a practical balance between sensitivity and traffic cost. It means you accept a 20% chance of missing the effect size you planned for.

What is the minimal detectable effect?

The minimal detectable effect is the smallest lift your test can reliably detect at the chosen alpha and power with the current sample sizes.

When should I use a one-sided test?

Use a one-sided test when only improvement matters and a decrease would never trigger the same decision. Otherwise, two-sided testing is usually safer.

Does unequal traffic change required sample size?

Yes. Unequal traffic generally increases the total sample needed because balanced groups usually produce the most efficient estimate for the same total visitors.

Can I use this for revenue metrics?

This version is designed for binary conversion outcomes. Revenue or continuous outcomes need different variance assumptions and usually a different power formula.

Why can p-value and power tell different stories?

P-value measures evidence in the observed data. Power measures the design strength before or around the effect size. Both answer different planning questions.

What sample sizes make the approximation reliable?

The normal approximation works better when both groups have enough successes and failures. Very small samples or rare events may need exact methods.

Related Calculators

binomial test calculatorab test sample sizeeffect size calculatorbayesian ab testab test calculatorpooled variance testab test p valuerisk ratio significancechi square ab test

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.