Binary Sample Size Calculator for AI & Machine Learning

Calculator inputs

Use precision mode for estimating one binary rate. Use comparison mode for testing uplift between two binary outcome rates.

Calculation method

Confidence level

Expected positive rate

Margin of error

Finite population size

Baseline positive rate

Variant positive rate

Alpha

Power

Design effect

Dropout or unusable rate (%)

Example data table

This table shows typical binary sample size planning scenarios in AI and machine learning work.

Scenario	Method	Core inputs	What the result means
Label quality audit	Single proportion precision	p = 0.50, error = 0.05, confidence = 95%	Estimate labels needed to measure a positive rate reliably.
False positive monitoring	Single proportion precision	p = 0.12, error = 0.03, confidence = 95%	Size a monitoring set for a narrow binary metric band.
Model uplift experiment	Two-proportion comparison	baseline = 0.20, variant = 0.26, power = 80%	Estimate per-group labels to detect real conversion uplift.
Bias review sample	Single proportion precision	p = 0.35, error = 0.04, confidence = 99%	Plan a stricter review set for governance reporting.

Formula used

1) Single proportion precision

Use this when you want to estimate one binary rate, such as accuracy, prevalence, acceptance, or positive label share.

n = (Z² × p × (1 - p)) / E²

Here, Z is the confidence z-score, p is the expected positive proportion, and E is the target margin of error.

When population size is limited, this page applies finite population correction:

n_fpc = n / (1 + ((n - 1) / N))

Then it adjusts for design effect and expected dropout:

n_final = ceil((n_fpc × design_effect) / (1 - dropout_rate))

2) Two-proportion comparison

Use this when you want to compare a baseline binary rate against a variant rate, such as two classifiers, prompts, policies, or ranking strategies.

n_per_group = ((Zα × √(2p̄(1-p̄)) + Zβ × √(p1(1-p1) + p2(1-p2)))²) / (p2 - p1)²

Here, p1 is the baseline rate, p2 is the variant rate, p̄ is the pooled rate, Zα matches alpha, and Zβ matches power.

The page also adjusts the per-group estimate for design effect and unusable samples.

How to use this calculator

Select Single proportion precision when estimating one binary rate.
Select Two-proportion comparison when comparing baseline and variant outcomes.
Enter rates as decimals, such as 0.20 for 20%.
Choose confidence or power settings that match your risk tolerance.
Add a finite population only when your labeling pool is limited.
Increase design effect when clustering or dependence inflates variance.
Add dropout to protect against rejected, missing, or noisy labels.
Click Calculate sample size to show the result above the form.
Review the summary table, class counts, and Plotly graph.
Use the CSV or PDF buttons to export your calculation record.

FAQs

1) What does this calculator estimate?

It estimates how many samples you need for binary outcomes. That includes label audits, classifier monitoring, conversion experiments, acceptance rates, or any yes-or-no target.

2) When should I use single proportion precision?

Use it when you want one reliable binary rate with a chosen error margin. Examples include prevalence estimation, positive label share, pass rate, or moderation approval rate.

3) When should I use two-proportion comparison?

Use it when comparing two binary rates. It fits A/B tests, prompt changes, classifier upgrades, threshold changes, and policy variants where you want to detect uplift.

4) What if I do not know the expected positive rate?

Use 0.50 for a conservative estimate. That choice usually produces the largest required sample and protects against underestimating your labeling needs.

5) Why does design effect matter?

Design effect inflates the sample when observations are not fully independent. Clustering, repeated users, grouped prompts, or batched annotation workflows can all increase variance.

6) Why add a dropout or unusable rate?

Some records become unusable because of missing labels, bad inputs, policy exclusions, or failed review. Dropout padding keeps the final usable sample large enough.

7) When should I use finite population correction?

Use it when your total candidate pool is limited and known. It reduces the needed sample because sampling a large share of a small pool adds information faster.

8) Why are expected positives and negatives shown?

They help you plan class balance, reviewer effort, and downstream training coverage. Very small minority counts may signal that you need stratified sampling.