Calculator
Example data table
| t | Series A | Series B |
|---|---|---|
| 1 | 12 | NA |
| 2 | 14 | NA |
| 3 | 15 | 12 |
| 4 | 13 | 14 |
| 5 | 16 | 15 |
| 6 | 18 | 13 |
| 7 | 17 | 16 |
| 8 | 19 | 18 |
| 9 | 21 | 17 |
| 10 | 20 | 19 |
| 11 | 22 | 21 |
| 12 | 23 | 20 |
Formula used
For lag k, the tool aligns pairs (A[t], B[t+k]) wherever both indices exist. Let the overlap size be N, with means μA and μB.
Cross-covariance (biased):
C(k) = (1/N) Σ (A[t] − μA)(B[t+k] − μB)
Cross-covariance (unbiased):
C(k) = (1/(N−1)) Σ (A[t] − μA)(B[t+k] − μB)
Cross-correlation (normalized):
r(k) = Σ (A[t] − μA)(B[t+k] − μB) / √(Σ(A[t] − μA)² · Σ(B[t+k] − μB)²)
How to use this calculator
- Enter Series A and Series B as ordered values (same time step).
- Set the maximum lag. The tool evaluates from negative to positive.
- Pick correlation (scale-free) or covariance (scale-dependent).
- Choose mean handling and whether to standardize the series.
- Handle missing values using NA or stop on missing.
- Press Compute to see results above the form.
- Download outputs with CSV or PDF buttons in results.
Why cross correlation matters in lag analysis
Cross correlation quantifies how two ordered signals move together after shifting one in time. In marketing, weekly ad spend can lead sales by 1–3 weeks; in sensor networks, vibration may precede temperature changes by minutes. This tool evaluates every lag from −L to +L and reports the overlap count, helping you spot the most informative alignment without guessing. Negative lags indicate B leads A in time.
Preparing series: scaling, demeaning, and missing data
Clean inputs improve interpretability. Demeaning removes level differences so the measure reflects co‑movement, not shared averages. Standardization converts each series to z‑scores, useful when units differ (rupees vs units, volts vs degrees). Missing values marked as NA can be skipped pairwise, preventing a few gaps from discarding the entire run. To enforce complete cases, choose stop mode and fix the CSV first.
Choosing lag limits and overlap thresholds
Max lag should match domain reality and sample size. With 120 daily points, testing ±60 lags halves the overlap at the extremes and can inflate noise. A practical rule is to keep overlap N above 30 for stable estimates, or above 50 when the relationship is weak. Use smaller lag windows when processes are fast, and larger windows for slow diffusion effects, and interpret edge lags cautiously.
Reading the table: strength, direction, and confidence
For correlation, values near 0.0 suggest no linear lead‑lag link, while ±0.3 is often a modest association and ±0.7 can be strong in many applied settings. Direction matters: positive r(k) means rises in A align with rises in B at lag k. Confidence intervals use Fisher’s transform; if the interval excludes zero, the lag is flagged as significant under the selected level. Use the best lag rule to highlight absolute impact or positive lifts in forecasting.
Reporting outputs and quality checks
Always sanity‑check the highlighted best lag against context and plots. A very high value at a tiny overlap may be spurious. Compare runs with and without standardization, and test alternative mean handling for non‑stationary series. Export the table to document assumptions, keep an audit trail, and share reproducible lag findings with stakeholders.
FAQs
What does a positive lag mean in this tool?
Lag k compares A[t] with B[t+k]. If k is positive, B occurs later than A, so A leads B by k steps. If k is negative, B leads A.
When should I use cross-covariance instead of correlation?
Use covariance when both series share meaningful units, or when absolute scaling matters. Use correlation when units differ, ranges vary, or you want a standardized comparison across datasets.
Why does the overlap N change as lag changes?
Shifting one series reduces the aligned window. At larger lags, fewer index pairs exist because one series runs out of values at the ends. Smaller overlap increases uncertainty, so interpret edge lags cautiously.
How are the confidence intervals calculated?
For correlation, the tool applies Fisher’s atanh transform with standard error 1/√(N−3), then converts back to r. With small N or autocorrelated data, treat intervals as approximate guidance.
My best lag is at the maximum lag. What should I do?
Increase the max lag only if a larger delay is plausible, and check plots for trends or seasonality. If overlap becomes small, the peak may be unstable. Try detrending or differencing before re-running.
How should I handle strong seasonality or non-stationary series?
Remove trend and seasonality first using differencing, detrending, or seasonal adjustment. Then run cross correlation on residuals. This reduces spurious peaks caused by shared cycles rather than true lead‑lag effects.