Enter Binomial Test Inputs
Large screens show three columns, medium screens show two, and mobile shows one.
Example Data Table
These sample cases show how different alternatives and baselines change the exact p-value and final inference.
| Scenario | Trials | Successes | Null Rate | Alternative | Exact P-Value | Decision at α = 0.05 |
|---|---|---|---|---|---|---|
| Email click lift | 30 | 15 | 0.30 | p > p0 | 0.016937 | Reject H0 |
| Defect scarcity check | 50 | 8 | 0.25 | p < p0 | 0.091597 | Fail to reject H0 |
| Fairness validation | 45 | 30 | 0.50 | p ≠ p0 | 0.035698 | Reject H0 |
| Conversion benchmark | 80 | 52 | 0.50 | p > p0 | 0.004841 | Reject H0 |
Formula Used
P(X = x) = C(n, x) × px × (1 − p)n − x
Right-tailed uses P(X ≥ x). Left-tailed uses P(X ≤ x). Two-sided sums probabilities that are at most as likely as the observed outcome under H0.
z = (x − np0) / √(np0(1 − p0)), with optional continuity correction for count data.
Observed proportion p̂ = x / n, effect = p̂ − p0. The interval shown is the exact Clopper–Pearson confidence interval.
Use the exact result for final decisions, especially when sample sizes are small or when p0 is close to 0 or 1.
How to Use This Calculator
- Enter the total number of trials and the number of observed successes.
- Set the null success probability that represents your benchmark or claimed rate.
- Choose the alternative hypothesis that matches your testing direction.
- Pick a significance level and confidence level for reporting.
- Submit the form to display the result panel above the calculator.
- Review the exact p-value, decision, interval, and approximation diagnostics.
- Use the export buttons to save the result table as CSV or PDF.
This setup fits product experiments, conversion analysis, defect tracking, clinical response checks, reliability screening, and any binary-outcome study with independent trials.
Baseline interpretation
A binomial test evaluates whether an observed success count is consistent with a stated benchmark probability. In analytics, this supports conversion checks, defect screening, churn studies, response validation, and experiments with binary outcomes. The calculator combines exact inference, confidence intervals, and approximation diagnostics so analysts can report evidence without relying on informal rules of thumb.
Why exact testing matters
Exact binomial methods remain valuable when samples are small, expected counts are limited, or the benchmark probability sits near the extremes. Under those conditions, normal approximations may distort tail areas and shift the reported significance level. By summing exact probabilities from the binomial distribution, the calculator preserves the intended test logic and improves the reliability of decisions.
Input quality and design assumptions
The model assumes independent trials, two possible outcomes per trial, and a constant success probability under the null hypothesis. Good input design matters because biased labeling, pooled populations, or drifting conditions can weaken interpretation. Analysts should define success before measurement, confirm sample scope, and record whether the question is right tailed, left tailed, or two sided.
Reading the output correctly
The exact p value measures how unusual the observed result would be if the benchmark rate were true. A small p value signals tension with the null assumption, but it does not measure practical importance. That is why the calculator also reports the observed proportion, expected successes, effect size, and an exact confidence interval for the underlying success probability.
Operational examples across teams
Marketing teams can compare campaign conversion rates against baselines. Product teams can test feature adoption success after launch. Quality teams can evaluate pass rates against service targets. Clinical and survey analysts can test response proportions against prior evidence. Across these settings, the same framework converts raw counts into a defensible statement about whether performance exceeds, matches, or falls below expectations.
Reporting and decision discipline
Sound reporting pairs statistical results with business context. Teams should document the benchmark probability, sampling window, chosen significance level, and reason for the selected alternative hypothesis. If the exact p value and confidence interval both support a difference, the conclusion becomes more persuasive. When evidence is weak, the safer message is that the data did not justify changing the assumption.
FAQs
1. When should I use a binomial test?
Use it when each trial has only two outcomes, the trials are independent, and the null hypothesis states a fixed success probability. Typical examples include clicks, passes, defects, responses, and conversions.
2. Which result matters more, exact or approximate?
The exact p-value drives the formal hypothesis decision. The normal approximation is included as a diagnostic reference, especially for larger samples where it should closely track the exact result.
3. Does a significant p-value mean the effect is important?
Not automatically. Statistical significance shows evidence against the null probability. Practical importance depends on the effect size, confidence interval, business context, cost, risk, and the value of changing decisions.
4. How do I choose the alternative hypothesis?
Two-sided tests detect any difference from the benchmark. Right-tailed tests ask whether the success probability is higher. Left-tailed tests ask whether it is lower. Choose the direction before reviewing the outcome.
5. What does the confidence interval add?
The interval estimates plausible values for the true success probability. If the benchmark rate lies outside that interval, the result often aligns with evidence against the null hypothesis.
6. What do the CSV and PDF exports include?
The export buttons save the displayed result table for reporting. CSV supports spreadsheet work, while PDF is useful for sharing a clean summary in reviews, audit packs, or presentations.