Granger Causality Test Calculator

Inputs

Enter two time series or upload a CSV

Works best with stationary, evenly spaced observations.

Lag order (p)

Common choices: 1–4 for small samples.

Significance level (α)

Smaller α is stricter.

Test direction

Runs one or two F-tests.

Include intercept (recommended)

First difference series (remove trends)

Standardize (z-score) before testing

CSV upload (optional)

Use uploaded CSV instead of text

CSV should have two columns: X, Y (header allowed).

Series X

Paste values separated by commas, spaces, or new lines.

Series Y

Keep the same number of observations for both.

Example

Sample paired observations

#	Series X	Series Y
1	10	8
2	12	9
3	11	10
4	13	10
5	15	12
6	14	13
7	16	13
8	18	15
9	17	16
10	19	17

Use this example to verify your setup and exports.

Formula used

The test compares two linear models on the same dependent series. For lag order p, the restricted model excludes the candidate “cause” lags:

Restricted:
y_t = c + Σ(i=1..p) a_i y_{t-i} + ε_t

The unrestricted model adds lags of the other series:

Unrestricted:
y_t = c + Σ(i=1..p) a_i y_{t-i} + Σ(i=1..p) b_i x_{t-i} + ε_t

Let RSS_r and RSS_u be residual sums of squares. The F-statistic is:

F = ((RSS_r − RSS_u) / p) / (RSS_u / (n − k_u))

Here, n is the number of usable rows after lags, and k_u is the number of unrestricted parameters.

How to use this calculator

Paste two equally spaced time series into X and Y, or upload a two-column CSV.
Select a lag order. Start small and increase only if justified.
Choose α to control strictness of your decision rule.
Enable first differencing if your data trend over time.
Run the test. Review p-values and model fit indicators.
Download CSV or PDF for reporting and documentation.

Why Granger testing supports model feature decisions

Granger causality evaluates whether past values of one signal improve forecasts of another within a linear autoregressive framework. In ML feature engineering, this helps you justify lagged inputs, avoid redundant predictors, and document why certain time based features were retained. The calculator reports F statistics and p values so teams can align on evidence. You can also compare both directions to detect feedback loops. This supports careful feature selection when training recurrent models, gradient boosting, or classic regression baselines too.

Selecting lag order for stable inference

Lag order controls how much history enters the restricted and unrestricted models. Too few lags can hide delayed effects; too many consume degrees of freedom and inflate variance. Use small p for short datasets, then compare AIC and BIC shown in results. Lower information criteria indicate better balance between fit and complexity. For minute level telemetry, start with 1–3 lags and iterate.

Handling trends and scaling in practical datasets

Many operational metrics drift due to growth, seasonality, or regime shifts. When series are nonstationary, Granger tests can overstate relationships. The first difference option reduces trend by testing changes rather than levels. Standardization makes coefficients comparable and can improve numerical stability when features have different magnitudes. If you difference, remember the sample length drops by one.

Interpreting directionality and decision thresholds

Direction matters: X → Y asks whether lagged X adds predictive value beyond lagged Y, while Y → X tests the reverse. A low p value under your alpha suggests predictive causality for that lag setting, not true mechanistic cause. For production decisions, combine this result with domain constraints and out of sample validation. When p values are borderline, prefer simpler models unless gains are clear.

Reporting results for audits and collaboration

Clear reporting reduces confusion when models are reviewed later. Export the CSV to keep inputs, transforms, and summary metrics together. Use the PDF report for lightweight sharing in design docs or tickets. Recording the chosen lag, alpha, and any differencing ensures another analyst can reproduce the same test on the same aligned observations. Add notes about sampling frequency and missing value handling to strengthen traceability.

FAQs

1) Does Granger causality prove real causation?

No. It tests whether lagged information improves prediction in a linear model. Use it as evidence for predictive relationships, then confirm with experiments, domain knowledge, and robust validation.

2) How many data points do I need?

You need more observations than the chosen lag order and parameters. As a practical rule, aim for at least 10–20 observations per lagged parameter to keep results stable.

3) When should I enable first differencing?

Enable it when series show strong trends or changing levels. Differencing reduces spurious findings caused by shared drift, but it changes interpretation to relationships between changes rather than levels.

4) What lag order should I start with?

Start with 1–3 lags for short datasets, then increase cautiously. Compare AIC and BIC and ensure df2 stays positive. Prefer the smallest lag that yields consistent, interpretable results.

5) Why do I get a singular matrix error?

It usually means too many lags for the sample size or highly collinear inputs. Reduce lag order, enable differencing, or provide more observations. Standardizing can also improve numerical conditioning.

6) How are the CSV and PDF exports different?

CSV preserves inputs, transforms, and numeric summaries for analysis. PDF is a readable snapshot for sharing. Run the test first, then export using the buttons in the results panel.