A/B Testing Sample Size Calculator

Calculator Inputs

Baseline conversion rate (%)

Minimum detectable effect

Effect type

Expected direction

Confidence level (%)

Statistical power (%)

Test type

Total variants including control

Treatment to control traffic ratio

Total daily experiment visitors

Traffic buffer (%)

Advanced option

Apply multi-variant alpha correction

Formula Used

This calculator estimates visitors for comparing two proportions. It supports equal or unequal traffic allocation, one-tailed or two-tailed testing, power, confidence, and optional alpha correction.

Core formula:

n control = [(Zα × √((1 + 1/r) × p̄ × (1 - p̄)) + Zβ × √(p1 × (1 - p1) + p2 × (1 - p2) / r))²] / (p2 - p1)²

Here, p1 is baseline rate. p2 is expected variant rate. r is treatment to control allocation. Zα comes from confidence. Zβ comes from power. p̄ is the weighted pooled rate.

How to Use This Calculator

Enter your current conversion rate as the baseline. Add the smallest uplift you want to detect. Choose whether that uplift is absolute or relative. Select confidence, power, test direction, number of variants, and traffic ratio. Add daily traffic to estimate duration. Press the submit button to show results above the form.

Example Data Table

Scenario	Baseline	MDE	Confidence	Power	Daily Visitors	Planning Note
Landing page test	5%	1 point	95%	80%	1,000	Balanced A/B test
Checkout test	12%	10% relative	95%	90%	2,500	Higher power target
Three variant test	8%	1.5 points	95%	80%	5,000	Correction recommended

A/B Test Planning Basics

A good A/B test starts before traffic is split. Sample size protects the test from weak evidence. It estimates how many visitors each experience needs before a fair comparison can happen. Without this step, teams often stop early, overreact to noise, or miss a useful lift.

Why Sample Size Matters

Every conversion rate has random variation. A page may look better today and worse tomorrow. Larger samples reduce that swing. They also improve the chance of detecting a real change. The calculator joins confidence, power, baseline rate, and minimum detectable effect. These inputs describe the risk level you accept and the lift you want to catch.

Choosing Baseline and Uplift

Use a baseline rate from recent data. Choose a period that matches the campaign, device mix, and traffic source. Avoid using one lucky day. The uplift target should be practical. A tiny lift needs a very large audience. A larger lift needs fewer visitors, but it may hide smaller gains. Relative uplift is useful for quick planning. Absolute percentage point uplift is clearer for final estimates.

Power and Confidence

Confidence controls false positive risk. A 95 percent confidence level is common. Power controls false negative risk. An 80 percent power target is also common. Higher power is safer, yet it needs more traffic. One-tailed tests need fewer visitors, but they only support one direction. Two-tailed tests are more cautious and better for general product decisions.

Traffic and Duration

Sample size becomes useful when paired with daily visitors. Duration tells you whether the plan is realistic. A test should usually cover full weekly cycles. This captures weekday and weekend behavior. Add a buffer for tracking loss, bot filtering, consent limits, or quality checks. Multi-variant tests need extra care. When several challengers are compared with one control, correction reduces the chance of a false winner.

Using Results Carefully

The estimate is a planning guide, not a promise. Keep the test running until the planned sample is reached. Check instrumentation before launch. Watch guardrail metrics during the experiment. After completion, interpret results with business context, not only statistical output. Good experiments combine clean math, stable data, and disciplined decisions. Review segments only after confirming the primary result remains stable.

FAQs

What is an A/B testing sample size?

It is the number of visitors needed in each test group. It helps the experiment detect a meaningful conversion change with selected confidence and power.

What is baseline conversion rate?

Baseline conversion rate is your current control performance. Use recent, stable data from the same page, audience, channel, and tracking setup.

What is minimum detectable effect?

It is the smallest change you want the test to detect. Smaller effects require larger sample sizes and longer test periods.

Should I use absolute or relative uplift?

Use absolute uplift for percentage point changes. Use relative uplift when planning around proportional growth from the current baseline rate.

What does statistical power mean?

Power is the chance of detecting a real effect when it exists. Higher power reduces missed winners but increases required sample size.

When should I use a two-tailed test?

Use a two-tailed test when the variant could be better or worse. It is more cautious and common for product decisions.

Why add a traffic buffer?

A buffer protects the plan from tracking loss, bot removal, consent gaps, and data quality filters. It gives safer final visitor targets.

Can I stop the test early?

Stopping early can inflate false positives. Run the test until the planned sample and time cycle are reached, unless safety metrics fail.