Two Sample Z Test Calculator

Inputs

Use summary statistics or raw values. Results update on submit.

White Theme Responsive Layout

Input mode

Raw mode estimates σ from sample data (best when n is large).

Alternative hypothesis

Pick the direction that matches your research question.

Significance level (α)

Common choices: 0.10, 0.05, 0.01.

Hypothesized difference (Δ0)

Use 0 for a standard difference-in-means test.

Sample 1 mean (x̄1)

Sample 2 mean (x̄2)

Sample 1 size (n1)

Sample 2 size (n2)

Sample 1 standard deviation (σ1)

Use known population σ when available.

Sample 2 standard deviation (σ2)

Optional: paste raw values (for Raw values mode)

Sample 1 values

Separate numbers with spaces, commas, or new lines.

Sample 2 values

Reset After submitting, results appear above this form.

Formula used

This calculator tests the difference between two population means using a z statistic:

z = ((x̄₁ − x̄₂) − Δ₀) / SE
SE = √( (σ₁²/n₁) + (σ₂²/n₂) )

p-values come from the standard normal distribution. A confidence interval is computed as (x̄₁ − x̄₂) ± z_1−α/2·SE.

How to use this calculator

Choose an input mode: summary statistics or raw values.
Select the alternative hypothesis (two, left, or right tailed).
Set α and Δ0, then enter your sample details.
Press Submit to view z, p-value, decision, and interval.
Use Download CSV or Download PDF to export results.

Example data table

Scenario	Group	Mean	Std. Dev.	n	α
Manufacturing yield	Line A	52.4	10.2	40	0.05
Manufacturing yield	Line B	49.1	9.7	35	0.05
App experiment	Variant	3.72	1.10	120	0.01
App experiment	Control	3.55	1.05	115	0.01

Tip: If you only have raw values, switch to Raw values mode and paste data.

Where the two-sample z test fits

This test compares two independent means. It assumes known population variability or large samples. It supports fast screening in experiments, quality checks, and monitoring. It is common in A/B testing dashboards and process control reports. It works well with KPI tracking. State assumptions near the output clearly.

Inputs that control accuracy

The standard error uses σ1, σ2, n1, and n2. Larger n reduces SE. Smaller SE increases the absolute z score for the same mean gap. A balanced design often improves precision. Doubling both sample sizes cuts SE by about 29%.

Interpreting z and p-value

The z score measures how far the observed difference is from Δ0 in SE units. A small p-value means the observed gap is unlikely under H0. At α = 0.05, the two-tailed critical value is about 1.96. At α = 0.01, it is about 2.576. Choose the tail option to match the decision rule.

Confidence interval as an effect range

The interval reports plausible values for μ1 − μ2. It is centered on x̄1 − x̄2 and scaled by z1−α/2·SE. If the interval excludes Δ0, the two-tailed test rejects at α. The width shrinks when σ falls or n rises. Use the interval to judge practical impact, not just significance.

Typical data scenarios

A/B tests often compare average revenue, time, or rating. Manufacturing compares mean thickness or yield. Healthcare compares mean blood pressure between cohorts. Finance compares average returns across strategies. Product teams track mean session length across releases.

Practical checks before you decide

Confirm independence between samples. Avoid mixing paired observations. Check for extreme outliers. They inflate variance and weaken power. Prefer stable measurement windows. Reduce seasonality noise. When σ is unknown and samples are small, use a two-sample t approach.

FAQs

1) What does Δ0 mean?

Δ0 is the hypothesized difference between population means in H0. Use 0 for “no difference”. Use another value for equivalence targets or business thresholds.

2) When should I use a z test here?

Use it when population standard deviations are known, or when sample sizes are large enough that using sample standard deviations is a stable approximation.

3) What is the difference between left, right, and two tailed?

Left tests μ1−μ2 below Δ0. Right tests above Δ0. Two tailed tests any nonzero deviation from Δ0, splitting α across both tails.

4) Why can p-values change with the same mean gap?

p-values depend on the standard error. Larger samples or smaller σ reduce SE, increase |z|, and usually reduce the p-value for the same observed difference.

5) What does “Reject H0” mean in practice?

It means the data is inconsistent with H0 at the chosen α. It does not prove a cause. Combine the decision with effect size and confidence interval context.

6) Can I paste raw values instead of summaries?

Yes. Use Raw values mode and paste numbers. The calculator computes means and sample standard deviations, then runs the z test. For small samples, consider a t test.