Logistic Regression Sample Size Calculator

Calculator

Baseline event probability

Expected odds ratio

Alpha

Target power

Exposure share

Number of predictors

Events per variable target

Covariate R squared

Design effect

Attrition proportion

Test tail

Planning method

Example Data Table

Scenario	Baseline Event Rate	Odds Ratio	Power	Predictors	EPV	Attrition
Clinical risk model	0.20	1.80	0.80	8	10	0.10
Rare event audit	0.08	2.00	0.90	6	15	0.05
Survey model	0.35	1.50	0.80	12	10	0.15

Formula Used

Event probability from odds ratio:

p1 = OR × p0 / (1 − p0 + OR × p0)

Overall event rate:

p = q × p1 + (1 − q) × p0

Power based sample approximation:

N = [Zα × √V0 + Zpower × √V1]² / (p1 − p0)²

Covariate and design adjustment:

Nadjusted = N / (1 − R²) × design effect

Events per variable check:

Nepv = predictors × EPV / overall event rate

Attrition inflation:

Final N = complete case N / (1 − attrition)

How to Use This Calculator

Enter the baseline event probability for the unexposed group.
Enter the expected odds ratio for the main predictor.
Select alpha, power, exposure share, and test tail.
Add the planned number of predictors and EPV target.
Use R squared when the main predictor is explained by covariates.
Use design effect for clustered or complex samples.
Add attrition to inflate the enrolled sample.
Click calculate, CSV, or PDF.

Logistic Regression Sample Size Planning

Why sample size matters

Logistic regression is used when the outcome is binary. The model may predict disease, success, default, failure, or any event with two states. A weak sample can make an odds ratio look unstable. It can also produce wide intervals and poor calibration. Sample size planning reduces that risk before data collection starts.

What this calculator checks

This calculator combines two common planning views. The first view estimates the records needed to detect an odds ratio for a binary predictor. It uses baseline event risk, expected odds ratio, alpha, power, and exposure balance. The second view checks the events per variable rule. That rule asks whether the expected number of events can support the planned predictors.

Advanced assumptions

Logistic models often include correlated predictors. A covariate R squared value can inflate the power based sample size. Clustered data or weighted surveys can need a design effect. Attrition also matters, because not every enrolled record becomes a complete case. The final sample is inflated for missing data so the usable sample stays near the target.

Interpreting results

The complete case sample is the number needed after losses. The enrolled sample is the larger number you should recruit or extract. Expected events show whether the model has enough outcome information. Expected non-events are also important, because a model with only events cannot separate risk groups well.

Good study practice

Use realistic values from pilot work, published studies, or registry summaries. Run sensitivity checks with smaller odds ratios and lower event rates. Document every assumption in your protocol. If the study is clinical, regulatory, or high cost, ask a statistician to review the design. This page gives a planning estimate, not a final guarantee.

Using results with R

Researchers may later confirm the estimate with simulation in R. Simulations can include nonlinearity, interaction terms, unbalanced sampling, and planned exclusions. They can also test convergence rates. Use the calculator output as the first scenario. Then vary each assumption. Save the table of runs. A transparent range is more useful than one optimistic number. Conservative planning protects power and improves model reliability. It also helps reviewers understand why the target sample was chosen before data collection starts.

FAQs

What is logistic regression sample size?

It is the number of records needed to estimate or test a logistic regression model with acceptable power, event counts, and model stability.

What does baseline event probability mean?

It is the expected event rate in the reference group. Use pilot data, past studies, or a reliable registry estimate.

Why does the calculator use odds ratio?

Logistic regression estimates odds ratios. The calculator converts the expected odds ratio into an event probability difference for planning.

What is events per variable?

Events per variable compares outcome events with model predictors. Higher values usually improve coefficient stability and reduce overfitting risk.

Should I use the power result or EPV result?

For safer planning, use the larger value. Power checks detection. EPV checks model information and stability.

What is covariate R squared?

It measures how much the main predictor is explained by other predictors. Higher values can require a larger sample.

What is design effect?

Design effect inflates the sample for clustering, weighting, or complex sampling. Use 1 for a simple independent sample.

Can this replace a statistician?

No. It gives a planning estimate. Complex studies should use simulation and expert statistical review before final approval.