Sample Size Calculation for Logistic Regression

Calculator

Outcome rate (%)

Target odds ratio

Main predictor type

Exposure rate for binary predictor (%)

Predictor standard deviation

Alpha level

Power (%)

Test sidedness

R squared with other covariates (%)

Model parameters

Events per variable rule

Dropout or unusable data (%)

Design effect

Formula Used

Log odds coefficient: β = |ln(OR)|

Continuous predictor power method: N = (Z_1-α/t + Z_power)² ÷ [p(1-p)β²Sx²(1-R²)]

Binary predictor conversion: p1 = OR × p0 ÷ (1 - p0 + OR × p0)

Marginal outcome balance: p = (1-q)p0 + qp1

Binary predictor power method: N = [Zα√V0 + Zpower√V1]² ÷ (p1-p0)²

Events rule: N = model parameters × EPV ÷ min(event rate, non-event rate)

Final sample: ceiling(max(power N, EPV N) × design effect ÷ (1 - dropout))

How to Use This Calculator

Enter the expected outcome rate as a percent. Add the odds ratio you want to detect. Select whether the main predictor is binary or continuous. For a binary predictor, enter its exposure rate. For a continuous predictor, enter its standard deviation.

Choose alpha, power, sidedness, model parameters, and EPV. Add R squared when the main predictor is related to other covariates. Use dropout for missing or unusable records. Use design effect for clustered or survey data. Press the calculate button. Review the result above the form.

Example Data Table

Scenario	Outcome Rate	Odds Ratio	Power	Parameters	EPV	Dropout
Clinical risk model	20%	1.80	80%	6	10	10%
Rare event study	8%	2.00	90%	8	15	15%
Survey predictor model	35%	1.50	80%	10	10	5%

Sample Size Planning for Logistic Regression

Why Planning Matters

Logistic regression needs enough outcomes and enough total records. A small study can give wide intervals. It can also produce unstable odds ratios. This calculator supports planning before data collection. It combines a Wald power approximation with an events per variable check. The larger answer is used as the safer starting point.

Power Method

The power method starts with the target odds ratio. It converts that value to a log odds coefficient. It then uses the selected alpha, desired power, outcome rate, and predictor information. For a continuous predictor, the method uses the predictor standard deviation. For a binary predictor, it uses the exposure proportion. The tool also adjusts for correlation with other covariates through an R squared value. A higher R squared means less unique information. So the required sample grows.

Events Per Variable

The events per variable method protects the model. Logistic models can look strong when events are few. More candidate terms need more observed events. The calculator multiplies the number of model parameters by the selected EPV rule. It then divides by the smaller outcome group rate. This makes the check more conservative when events or non-events are rare. You can change EPV from 10 to 20 for stronger planning.

Adjustments

Dropout and design effect are added after the core calculation. Dropout covers missing records, consent loss, and unusable responses. Design effect covers clustering, repeated sampling, or complex survey design. A design effect of one means simple independent records. A larger value increases the sample before dropout is applied. The final number is rounded upward. Research teams should treat it as a planning estimate.

Practical Review

Use the result with judgment. Logistic regression assumptions still matter. Predictors should be clearly defined. Categories should not be too sparse. Continuous predictors may need scaling. Interaction terms count as extra parameters. Missing data plans should be written early. Very large odds ratios may look promising, but they can be unrealistic. Sensitivity checks are helpful. Try different outcome rates, odds ratios, EPV rules, and dropout values. Report the chosen inputs with the final sample size.

Study Note

This page does not replace a statistician. It gives a transparent estimate. Use it for protocols, grant notes, and reviews. Final designs should consider bias, sampling frame, measurement quality, and study aims carefully.

FAQs

What does this calculator estimate?

It estimates the total sample size for a logistic regression model. It checks both statistical power and events per variable. The final answer uses the larger requirement after design effect and dropout adjustments.

What is an odds ratio?

An odds ratio shows how the odds of an outcome change with a predictor. A value above 1 suggests higher odds. A value below 1 suggests lower odds. The calculator uses its natural log.

What is EPV?

EPV means events per variable. It is a rule for model stability. More predictors need more outcome events. Many studies use at least 10 EPV, but higher values can be safer.

Should I use one-sided or two-sided alpha?

Use two-sided alpha for most confirmatory studies. It tests effects in either direction. Use one-sided alpha only when your protocol justifies one direction before seeing data.

What does R squared mean here?

It represents how much the main predictor is explained by other covariates. A higher value means less independent information. The calculator increases sample size when R squared rises.

Why include dropout?

Dropout covers records that may be lost, incomplete, or unusable. The calculator inflates the required sample so the usable final dataset remains large enough.

What is design effect?

Design effect adjusts for complex sampling, clustering, or correlated observations. Use 1 for simple independent records. Use a larger value when the study design reduces effective information.

Is this a final study design?

No. It is a planning tool. Final sample size should also consider sampling methods, missing data strategy, predictor coding, ethical limits, cost, and expert statistical review.