AB Test Statistical Significance Calculator

Compare variants with z scores, lift, and intervals. Check one sided or two sided evidence. Decide winners with transparent AB test outputs and notes.

Calculator Form

Example Data Table

Variant Visitors Conversions Conversion Rate Meaning
A 10,000 500 5.00% Control experience
B 10,000 560 5.60% New experience

Formula Used

Conversion rate for each variant is conversions divided by visitors.

Rate A = xA / nA and Rate B = xB / nB.

The pooled rate is p = (xA + xB) / (nA + nB).

The pooled standard error is SE = sqrt(p × (1 - p) × (1 / nA + 1 / nB)).

The z score is z = (Rate B - Rate A) / SE.

The p value comes from the standard normal curve. The confidence interval uses unpooled standard error.

How To Use This Calculator

  1. Enter visitors and conversions for variant A.
  2. Enter visitors and conversions for variant B.
  3. Choose alpha, confidence level, and test direction.
  4. Add a practical lift target if needed.
  5. Enter revenue and visitor cost for value estimates.
  6. Press the calculate button and review the result above the form.
  7. Use CSV or PDF buttons to save the output.

Understanding AB Test Significance

AB testing compares two versions of a page, offer, email, or flow. The goal is simple. You want to know whether the observed difference is real enough to trust. Random chance can create gaps between variants. A significance test estimates how likely that gap is under equal performance.

What The Calculator Measures

This calculator uses a two proportion z test. It compares the conversion rate for variant A with the rate for variant B. It also reports absolute lift, relative lift, standard error, z score, p value, confidence interval, and a decision. These outputs help you judge both evidence and impact.

Why Sample Size Matters

Small samples can move sharply with only a few conversions. That makes early results unstable. Larger samples reduce noise and narrow the confidence interval. The p value may fall when the observed lift is large, or when enough traffic has been collected. A strong result should still match business goals.

Choosing The Right Tail

A two sided test asks whether the variants are different in either direction. It is the safest default for most product tests. A one sided test asks whether B is better than A, or worse than A. Use it only when the direction was chosen before the experiment started.

Reading The Result

Statistical significance does not prove future profit. It only says the observed gap is unlikely under the null model. Look at the confidence interval too. A narrow interval gives a clearer estimate. A wide interval means the true effect may still be uncertain. Also consider tracking quality, test duration, seasonality, and repeated peeking.

Practical Experiment Advice

Define the primary metric before launching the test. Avoid changing the goal after results arrive. Run the test through normal business cycles. Exclude bot traffic and broken sessions where possible. Keep both variants active at the same time. Do not stop only because the result briefly looks exciting. Use the output as a decision guide, then combine it with product context and risk.

A useful winner should be statistically clear and commercially meaningful. Check costs, margins, and user experience before rollout. When traffic is limited, treat inconclusive tests as learning, not failure. Better planning improves every later experiment too.

FAQs

What is statistical significance in an AB test?

It shows whether the observed difference between two variants is unlikely to be caused by random chance under the null assumption.

Which test does this calculator use?

It uses a two proportion z test. This method compares two conversion rates from independent visitor groups.

Should I use a two sided test?

Use a two sided test when either variant could win. It is the common default for most experiments.

When is a one sided test suitable?

Use it only when the direction was chosen before the test. Do not switch after seeing results.

What does alpha mean?

Alpha is your chosen false positive risk. A value of 0.05 means a five percent threshold.

Why is practical lift included?

A result can be statistically significant but too small to matter. Practical lift checks business usefulness.

Can I stop a test early?

Stopping early can inflate false positives. Try to set sample size and duration before launching.

What should I export?

Export the result table when sharing decisions, documenting experiments, or comparing later test outcomes.

Related Calculators

Paver Sand Bedding Calculator (depth-based)Paver Edge Restraint Length & Cost CalculatorPaver Sealer Quantity & Cost CalculatorExcavation Hauling Loads Calculator (truck loads)Soil Disposal Fee CalculatorSite Leveling Cost CalculatorCompaction Passes Time & Cost CalculatorPlate Compactor Rental Cost CalculatorGravel Volume Calculator (yards/tons)Gravel Weight Calculator (by material type)

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.