Calculator
Example data table
| Time | X | Y |
|---|---|---|
| 1 | 1 | 1.2 |
| 2 | 2 | 1.9 |
| 3 | 3 | 3.4 |
| 4 | 4 | 3.8 |
| 5 | 5 | 5.1 |
| 6 | 6 | 5.9 |
| 7 | 7 | 7.2 |
| 8 | 8 | 7.9 |
| 9 | 9 | 9.4 |
| 10 | 10 | 9.8 |
| 11 | 11 | 11.1 |
| 12 | 12 | 12 |
Formula used
Granger testing compares a restricted model to an unrestricted model for a chosen lag L.
The F statistic is computed from the sum of squared errors (SSE):
- SSER: restricted model error
- SSEU: unrestricted model error
- neff: usable rows after lagging
- kU: unrestricted parameters (incl. intercept)
How to use this calculator
- Enter aligned values for Series X and Series Y, or upload CSV.
- Select direction (X → Y, Y → X, or both).
- Choose lag mode: auto selection or a fixed lag.
- Set α, then pick optional preprocessing if needed.
- Press Run Test to view results above.
- Use the download buttons to export CSV or PDF.
What the test measures in time series
Granger testing asks whether past X improves prediction of Y. It compares a restricted regression using only lagged Y terms with an unrestricted regression that also includes lagged X terms. If the added X lags reduce error enough, the F statistic increases and the p-value decreases. Stationarity assumptions matter; treat trends carefully. This tool reports both directions so predictability can be checked symmetrically.
Lag choice and information criteria
Lag length defines how far back the model looks. Too few lags can miss delayed relationships, while too many lags consume degrees of freedom and overfit noise. Auto mode evaluates lags up to a maximum and selects the minimum AIC or BIC. AIC typically prefers more flexibility; BIC is stricter and often chooses shorter lags for small samples. Choose max lag using domain cycles and sampling rate.
Data alignment and preprocessing options
Use evenly spaced observations and align timestamps carefully, because misalignment can manufacture apparent predictability. Many series are non-stationary, so first differences can stabilize the mean. Demeaning and standardizing help when levels differ widely or when numerical scaling causes unstable estimates. If you difference, interpret results as changes, not levels. Remember that the effective sample size shrinks by the selected lag, so longer histories support more reliable tests.
Interpreting outputs beyond significance
A small p-value indicates that lagged X terms add predictive content at the chosen α threshold; it does not prove real-world cause. Read the F statistic together with df1 and df2 to understand how many restrictions were tested and how much data remained. Compare restricted and unrestricted R² to see the fit, and check whether conclusions persist across nearby lags. When many lags are explored, be cautious about false positives.
Reporting results and practical limits
When sharing results, report the sampling interval, preprocessing steps, and the lag strategy used. Include both directions and document whether an intercept was fitted. Granger conclusions can fail under omitted variables, structural breaks, shared trends, or regime shifts. For stronger evidence, pair this test with domain knowledge and additional diagnostics such as residual checks and stability plots. Exported reports help keep a clear trail.
FAQs
Does Granger causality prove real causation?
No. It only tests whether lagged values of one series improve prediction of another within the chosen model. Common drivers, feedback loops, and trends can create predictability without a direct causal mechanism.
How many data points should I use?
More is better. After lagging, the effective sample drops by the selected lag and model parameters. Aim for dozens of observations at minimum, and substantially more when testing larger lags or noisy data.
Should I difference or standardize my series?
If the series has strong trends or unit-root behavior, differencing can help approximate stationarity. Standardizing is useful when scales differ greatly and you want stable estimation. Interpret outputs based on the transformed data.
How do I choose the lag length?
Use domain knowledge first, then compare nearby lags. Auto selection with AIC or BIC can provide a reasonable starting point, but confirm that results are stable and degrees of freedom remain adequate.
Why do X → Y and Y → X results differ?
Direction matters because each regression predicts a different target. It is common to find predictive signal in one direction only, especially with delayed responses or asymmetric feedback. Testing both directions helps identify likely lead–lag structure.
Can this handle more than two variables?
This single-file calculator is bivariate. For multivariate Granger analysis, you would fit a vector autoregression and test joint restrictions across multiple predictors. Use dedicated econometrics software when you need that scope.