Understanding AB Test Significance
AB testing compares two versions of a page, offer, email, or flow. The goal is simple. You want to know whether the observed difference is real enough to trust. Random chance can create gaps between variants. A significance test estimates how likely that gap is under equal performance.
What The Calculator Measures
This calculator uses a two proportion z test. It compares the conversion rate for variant A with the rate for variant B. It also reports absolute lift, relative lift, standard error, z score, p value, confidence interval, and a decision. These outputs help you judge both evidence and impact.
Why Sample Size Matters
Small samples can move sharply with only a few conversions. That makes early results unstable. Larger samples reduce noise and narrow the confidence interval. The p value may fall when the observed lift is large, or when enough traffic has been collected. A strong result should still match business goals.
Choosing The Right Tail
A two sided test asks whether the variants are different in either direction. It is the safest default for most product tests. A one sided test asks whether B is better than A, or worse than A. Use it only when the direction was chosen before the experiment started.
Reading The Result
Statistical significance does not prove future profit. It only says the observed gap is unlikely under the null model. Look at the confidence interval too. A narrow interval gives a clearer estimate. A wide interval means the true effect may still be uncertain. Also consider tracking quality, test duration, seasonality, and repeated peeking.
Practical Experiment Advice
Define the primary metric before launching the test. Avoid changing the goal after results arrive. Run the test through normal business cycles. Exclude bot traffic and broken sessions where possible. Keep both variants active at the same time. Do not stop only because the result briefly looks exciting. Use the output as a decision guide, then combine it with product context and risk.
A useful winner should be statistically clear and commercially meaningful. Check costs, margins, and user experience before rollout. When traffic is limited, treat inconclusive tests as learning, not failure. Better planning improves every later experiment too.