Calculator
Example Data Table
| Test | Visitors A | Conversions A | Visitors B | Conversions B | Use case |
|---|---|---|---|---|---|
| Landing page headline | 10000 | 500 | 10000 | 560 | Measure conversion lift |
| Checkout button copy | 8500 | 612 | 8700 | 675 | Compare purchase rates |
| Email subject line | 22000 | 1320 | 21800 | 1395 | Review campaign response |
Formula Used
Conversion rate A: pA = conversions A / visitors A
Conversion rate B: pB = conversions B / visitors B
Absolute difference: d = pB - pA
Relative lift: lift = (pB - pA) / pA
Pooled proportion: p = (xA + xB) / (nA + nB)
Pooled standard error: SE = sqrt(p × (1 - p) × (1 / nA + 1 / nB))
Z score: z = (pB - pA) / SE
P value: calculated from the standard normal distribution.
Confidence interval: d ± critical z × unpooled SE
How To Use This Calculator
- Enter labels for the control and test variant.
- Add visitors and conversions for both variants.
- Select the confidence level for your decision.
- Choose the hypothesis direction that matches your test plan.
- Set a practical effect threshold if business impact matters.
- Press the calculate button and review the result above the form.
- Download the CSV or PDF report for records.
Why A/B Significance Matters
An A/B test compares two versions of a page, offer, email, or flow. Version A is usually the control. Version B is the challenger. The goal is not only to see which version converts more. The goal is to judge whether the observed lift is likely real, or just random noise from sampling.
What This Calculator Evaluates
This calculator uses visitor and conversion counts for both variants. It converts those counts into rates. Then it estimates the difference between rates, relative lift, pooled standard error, z score, and p value. It also builds a confidence interval for the absolute rate difference. These values help you decide whether the challenger has enough evidence to beat the control.
Interpreting The Result
A small p value means the observed gap would be uncommon if both variants had the same true conversion rate. When the p value is below your selected alpha level, the calculator marks the test as statistically significant. A positive lift shows B converted better than A. A negative lift shows B performed worse. The confidence interval adds useful context. If a two sided interval excludes zero, the result usually supports a real difference.
Practical Testing Guidance
Statistical significance is important, but it is not the whole decision. Check the sample size, traffic quality, audience mix, test duration, and business impact. Do not stop a test too early because early results can swing sharply. Try to run full business cycles when traffic changes by weekday, device, season, or campaign source. Also compare revenue, leads, refunds, and retention when those outcomes matter more than simple conversions.
Common Mistakes To Avoid
A frequent mistake is testing many changes at once without tracking the cause. Another mistake is declaring a winner after checking results every hour. Repeated peeking raises the chance of a false positive. Segment analysis can be helpful, but tiny segments create unstable results. Use this calculator as a strong first review, then combine it with product judgment and clean experiment design.
Data Quality Checklist
Use clean tracking before reading the result. Remove bot traffic when possible. Keep one primary metric. Make sure both variants run at the same time and share the same audience rules for fairness.
FAQs
What is A/B test statistical significance?
It shows whether the observed difference between two variants is unlikely to be random sampling noise. A lower p value gives stronger evidence against equal conversion rates.
What data do I need?
You need visitors and conversions for the control variant and the test variant. The calculator uses these counts to estimate conversion rates and significance.
What does the p value mean?
The p value estimates how unusual the observed difference would be if both variants had the same true conversion rate. Smaller values indicate stronger evidence.
Is 95% confidence always best?
Not always. A 95% level is common, but high risk decisions may need 99%. Early exploratory tests may use lower levels with caution.
What is relative lift?
Relative lift compares the conversion rate change against the control rate. For example, moving from 5% to 6% gives a 20% relative lift.
Can I use this for one-sided tests?
Yes. Choose whether B is expected to be greater than A or lower than A. Use this only when the direction was planned before analysis.
Why is practical effect included?
A result can be statistically significant but too small to matter. The practical effect threshold helps compare the lift against business value.
Should I stop a test once it is significant?
Usually no. Avoid stopping too early. Run the test for a planned duration and check traffic quality, sample size, and business cycles.