Calculator
Example data table
| y | x1 | x2 | offset | exposure |
|---|---|---|---|---|
| 3 | 0.6 | 1.2 | 0 | 1 |
| 2 | 0.2 | 0.7 | 0 | 1 |
| 7 | 1.3 | 1.1 | 0 | 2 |
| 1 | 0.1 | 0.4 | 0 | 1 |
| 5 | 0.9 | 0.9 | 0 | 1 |
| 9 | 1.7 | 1.5 | 0 | 3 |
Formula used
The model assumes a Poisson count response with mean μ:
log(μᵢ) = β₀ + β₁xᵢ1 + … + βₚxᵢp + offsetᵢ
Coefficients are estimated using IRLS (Newton’s method for GLMs):
- Wᵢ = μᵢ
- zᵢ = ηᵢ + (yᵢ − μᵢ)/μᵢ
- β ← (XᵀWX)⁻¹XᵀWz
How to use this calculator
- Paste your dataset as CSV with numeric columns.
- Select the response count column and predictors.
- Optionally add offset and exposure columns.
- Run the model and read β and IRR.
- Download a CSV or PDF report for sharing.
Counts, rates, and link scale
Poisson regression targets non‑negative counts and connects predictors to the mean with a log link. When you include exposure, the model estimates rates per unit exposure rather than raw totals. For example, a log(exposure) offset converts “events per week” and “events per month” into comparable scales across rows.
Coefficient meaning and IRR
Each coefficient β is on the log scale. Exponentiating gives the incidence rate ratio (IRR). If a predictor has IRR 1.20, a one‑unit increase multiplies the expected count by 1.20, holding other predictors fixed. An IRR below 1.00 indicates a proportional decrease, such as 0.85 meaning 15% lower expected counts.
Model fit signals you can quantify
Deviance summarizes disagreement between observed counts and fitted means; it is most useful when comparing models on identical rows. Pearson χ² divided by degrees of freedom is a practical dispersion check. Values near 1.0 align with Poisson variance, while 1.5–2.5 often suggests extra‑Poisson variation worth investigating.
Overdispersion and robust inference
When dispersion exceeds 1.0, standard errors can be understated. Robust (sandwich) standard errors keep coefficient estimates unchanged but adjust uncertainty to match residual variability. This is helpful for clustered, heterogeneous, or omitted‑variable settings, especially when the primary goal is reliable inference for IRR and confidence intervals.
Prediction workflow and sanity checks
Predictions return μ = exp(Xβ + offset). Use “row prediction” to validate against your data and “manual prediction” for scenario testing. A useful check is the observed‑vs‑fitted plot: if many points sit far above the diagonal, the model underestimates high counts; far below implies overestimation.
Data requirements and stability
Reliable estimation needs variation in predictors and enough rows relative to parameters. If XᵀWX becomes nearly singular, coefficients can blow up and IRR becomes unstable. Reduce correlated predictors, rescale large values, or add more rows. As a rule, aim for at least 10–20 informative rows per parameter for steady behavior.
FAQs
1) When should I use an exposure column?
Use exposure when counts come from different observation times or populations. The calculator adds log(exposure) to the offset, so coefficients represent rate changes rather than total changes.
2) What if my data has many zeros?
The model can handle zeros, but many structural zeros may reduce fit. Check residual plots and consider whether a zero‑inflated or hurdle approach is more appropriate for your process.
3) Why is dispersion greater than 1?
Dispersion above 1 indicates variance exceeds the Poisson mean. Common causes include unmodeled heterogeneity, clustering, or missing predictors. Robust errors help with inference, but model refinement may be needed.
4) Can I interpret the intercept as a baseline rate?
Yes, when predictors are zero and offset is zero, exp(intercept) is the baseline mean count. With exposure or offset, it becomes the baseline rate on that adjusted scale.
5) What does a non‑converged fit mean?
Non‑convergence suggests the algorithm could not stabilize coefficients within the tolerance. Try fewer predictors, rescale inputs, remove collinearity, or increase iterations. Extreme outliers can also cause instability.
6) Are p‑values exact?
The calculator uses large‑sample normal approximations (z tests). For small samples, results can be optimistic. Prefer confidence intervals, robust errors, and validation plots to support conclusions.