AB Test P Value Calculator

Calculator Inputs

Control Visitors

Control Conversions

Variant Visitors

Variant Conversions

Significance Level (α)

Confidence Level

Alternative Hypothesis

Decimal Precision

Reset

This calculator uses a two proportion z test and works best when each group has enough observations and conversions for normal approximation.

Example Data Table

Group	Visitors	Conversions	Conversion Rate	Lift vs Control	P Value	Interpretation
Control	10000	520	5.20%	0.00%	0.0400	Reference
Variant	9800	575	5.87%	12.83%	0.0400	Significant at 0.05

Formula Used

Control rate: p₁ = x₁ / n₁

Variant rate: p₂ = x₂ / n₂

Pooled rate: p = (x₁ + x₂) / (n₁ + n₂)

Pooled standard error: SE = √[ p(1-p)(1/n₁ + 1/n₂) ]

Z score: z = (p₂ - p₁) / SE

Two sided p value: p-value = 2 × [1 - Φ(|z|)]

Difference interval: (p₂ - p₁) ± z* × SE_diff

Where x is conversions, n is visitors, Φ is the standard normal cumulative function, and z* is the critical value for the selected confidence level.

The calculator also reports relative uplift, risk ratio, odds ratio, pooled rate, and an approximate power estimate for quicker decision review.

How to Use This Calculator

Enter total visitors and conversions for the control group.
Enter total visitors and conversions for the variant group.
Choose the significance level that matches your risk tolerance.
Select a confidence level for interval estimates.
Pick the alternative hypothesis based on your experiment design.
Press calculate to see the p value, lift, intervals, and graph.
Download the computed summary as CSV or PDF if needed.

FAQs

1. What does the p value tell me?

It measures how likely your observed difference appears if no real effect exists. Lower values suggest the variant result is less compatible with random chance alone.

2. Which statistical test is used here?

This page uses a two proportion z test, a common method for comparing binary conversion rates between control and variant groups.

3. When should I use a one sided hypothesis?

Use one sided testing only when your decision rule was defined before the experiment and you care about improvement in one direction only.

4. Can unequal sample sizes be tested?

Yes. The formulas directly support different visitor counts in each group, so balanced traffic is helpful but not required.

5. Why do confidence intervals matter?

Intervals show a plausible range for the true difference. They help you judge uncertainty, practical impact, and whether zero remains a realistic outcome.

6. What is relative uplift?

Relative uplift compares the conversion change against the control rate. It shows proportional improvement, which is often easier for teams to interpret.

7. Is significance enough to ship a winner?

Not always. Review confidence intervals, practical business impact, experiment quality, segmentation, and guardrail metrics before making a final rollout choice.

8. What if traffic is very low?

Very small samples can make normal approximation unreliable. In that case, collect more data or consider an exact method before acting confidently.