Bayesian Test Calculator

Calculator Inputs

Enter prior belief, operating characteristics, and observed evidence counts. Large screens use three columns, medium screens use two, and mobile uses one.

Prior probability (%)

Base probability of hypothesis H before evidence.

Sensitivity (%)

True positive rate, P(positive | H).

Specificity (%)

True negative rate, P(negative | not H).

Positive observations

Independent positive signals or repeated positive tests.

Negative observations

Independent negative signals or repeated negative tests.

Population size

Used for confusion matrix estimates and predictive values.

Decision threshold (%)

Posterior probability needed for your decision rule.

Decimal precision

Controls displayed precision for advanced metrics.

Assumption: repeated positive and negative observations are treated as conditionally independent given the hypothesis class.

Example Data Table

This sample shows how the calculator behaves with one realistic AI classification scenario.

Example input	Value	Example output	Result
Prior probability	15%	Posterior probability	48.53%
Sensitivity	92%	Bayes factor	5.3434
Specificity	88%	PPV	57.50%
Positive observations	2	NPV	98.42%
Negative observations	1	Accuracy	88.60%
Population size	10,000	Balanced accuracy	90.00%
Decision threshold	75%	Decision	Below threshold

Formula Used

The calculator applies Bayes' theorem and extends it across repeated independent observations.

1) Posterior probability
Posterior = [ P(H) × P(E | H) ] / P(E)

2) Evidence under the hypothesis
P(E | H) = Sensitivity^positive × (1 − Sensitivity)^negative

3) Evidence under the alternative
P(E | not H) = (1 − Specificity)^positive × Specificity^negative

4) Total evidence probability
P(E) = P(H) × P(E | H) + P(not H) × P(E | not H)

5) Bayes factor
Bayes Factor = P(E | H) / P(E | not H)

6) Predictive values
PPV = TP / (TP + FP)
NPV = TN / (TN + FN)

7) Quality metrics
Accuracy = (TP + TN) / Population
Balanced Accuracy = (Sensitivity + Specificity) / 2
F1 = 2 × Precision × Recall / (Precision + Recall)

How to Use This Calculator

Enter the prior probability for the hypothesis or positive class.
Enter sensitivity and specificity from model validation or external testing.
Add the number of positive and negative observed signals.
Set a population size if you want estimated confusion matrix counts.
Choose a decision threshold that fits your application risk.
Submit the form and read the result section above the form.
Export the metrics using the CSV or PDF buttons.
Review the assumptions before using repeated-evidence results operationally.

Why This Helps in AI & Machine Learning

Bayesian testing is useful when you need probability updates instead of raw scores.

It combines prior knowledge with current evidence.
It translates classifier behavior into decision-friendly posterior probabilities.
It highlights how prevalence changes predictive value.
It helps compare evidence strength through Bayes factors.
It turns sensitivity and specificity into more practical decision metrics.

FAQs

1. What does this calculator estimate?

It estimates posterior probability, Bayes factor, evidence likelihood, predictive values, and a population-based confusion matrix using prior probability, sensitivity, specificity, and observed evidence counts.

2. What is the prior probability?

The prior probability is your belief in the hypothesis before seeing the current evidence. In machine learning, it often reflects class prevalence or a baseline assumption from earlier data.

3. Why do predictive values change with prevalence?

PPV and NPV depend on class prevalence. Even a strong classifier can produce weak PPV when the positive class is rare, because false positives can outnumber true positives.

4. What does the Bayes factor mean?

The Bayes factor compares how well the observed evidence fits the hypothesis versus the alternative. Values above one support the hypothesis, while values below one support the alternative.

5. Can I use repeated positive and negative observations?

Yes, but the calculator assumes those observations are conditionally independent. If your signals are correlated, the update may overstate evidence strength and should be interpreted carefully.

6. How is this different from plain accuracy?

Accuracy summarizes correct classifications overall. This calculator goes further by updating belief, quantifying evidence strength, and showing predictive values that matter when classes are imbalanced.

7. When should I change the decision threshold?

Raise the threshold when false positives are costly. Lower it when missing true positives is worse. The best threshold depends on the business, clinical, or operational context.

8. Is this suitable for production model decisions?

It is useful for analysis and decision support. Production use should also consider calibration quality, dependency between signals, drift, and the real costs of each error type.