GLM Sample Size Calculator

Calculator Inputs

GLM family

Choose the outcome family for the coefficient test.

Predictor type

Binary predictors use prevalence for variance.

Test tails

Two-sided tests usually need larger samples.

Target coefficient beta

Use slope units, log-odds, or log-rate effect.

Predictor SD

Only used for continuous predictors.

Binary predictor prevalence

Example: 0.40 means 40% in coded group.

Alpha

Common value is 0.05.

Target power

Common value is 0.80 or 0.90.

Outcome SD

Needed for Gaussian outcomes.

Baseline probability

Use the expected event proportion.

Baseline rate

Use the expected mean count or rate.

Dispersion factor

Use values above 1 for overdispersion.

Variance inflation factor

Accounts for predictor correlation.

Total predictors

Include the tested predictor in this count.

Stability factor

Typical planning values range from 10 to 20.

Expected dropout %

Final target inflates for attrition.

Example Data Table

Scenario	Family	Effect Assumption	Baseline Metric	Adjusted Sample
Screening outcome model	Logistic	β = 0.35	p0 = 0.30	408
Counts per exposure model	Poisson	β = 0.22	μ0 = 0.80	243
Continuous outcome model	Gaussian	β = 0.25	SDy = 1.20	230

Formula Used

Core Wald approximation:

n = ((z_1-α/t + z_power)² × φ × VIF) / (β² × Var(X) × I₀)

Gaussian: I₀ = 1 / σ², so n = ((z terms)² × φ × VIF × σ²) / (β² × Var(X))

Logistic: I₀ = p₀(1-p₀)

Poisson: I₀ = μ₀

Binary predictor variance: Var(X) = q(1-q)

Continuous predictor variance: Var(X) = SD(X)²

Recommended base sample: max(Wald minimum, stability minimum)

Dropout adjustment: Final n = Base n / (1 - dropout proportion)

How to Use This Calculator

Choose the GLM family that matches your outcome.
Enter the coefficient you want to detect.
Set predictor variance with SD or prevalence.
Add alpha, target power, and one-sided or two-sided testing.
Enter the outcome metric: SD, event probability, or event rate.
Increase dispersion or VIF when data are noisier or predictors correlate.
Set predictors and stability factor for practical model reliability.
Add dropout, then calculate and review the table, summary cards, and graph.

FAQs

1. What does this calculator estimate?

It estimates a planning sample size for testing one GLM coefficient. It supports Gaussian, logistic, and Poisson models. It also applies a model stability safeguard and a dropout adjustment.

2. What does beta mean here?

Beta is the coefficient you want to detect. For Gaussian models, it is a slope. For logistic models, exp(beta) is the odds ratio. For Poisson models, exp(beta) is the rate ratio.

3. Why is predictor variance important?

Higher predictor variance usually improves information. That lowers the sample needed for the same effect. Continuous predictors use SD squared. Binary predictors use prevalence times one minus prevalence.

4. Why does the logistic sample explode near extreme probabilities?

Information is strongest near a 50% event rate. When the baseline probability moves near zero or one, p(1-p) shrinks. Smaller information means a larger sample is needed.

5. What does the VIF input do?

VIF inflates the required sample when predictors are correlated. A VIF of 1 means no inflation. Larger values mean your focal coefficient is estimated less efficiently.

6. Why include a stability factor?

Pure power formulas can be optimistic for multivariable models. The stability factor adds a practical floor tied to predictor count. That helps prevent fragile estimates and underpowered fitted models.

7. Should I use one-sided or two-sided testing?

Two-sided tests are the usual default because they guard against effects in both directions. One-sided tests need stronger prior justification. Two-sided settings generally require larger samples.

8. Is this exact for every study design?

No. It is an approximation for early planning. Complex designs, clustering, repeated measures, time-to-event outcomes, and rare events often need specialized methods or simulation.