Transfer entropy from X to Y is estimated as:
TX→Y = Σ p(yt+1, yt(k), xt(l)) · log( p(yt+1 | yt(k), xt(l)) / p(yt+1 | yt(k)) )
Here, yt(k) and xt(l) are embedded past vectors built with delay tau. This tool discretizes each series into equal-width bins and uses empirical counts to estimate probabilities.
- Paste two synchronized time series for X and Y.
- Choose direction (X → Y, Y → X, or both).
- Set embedding lengths
kandl, plus delaytau. - Pick a bin count. Start with 6–12 bins for moderate data.
- Click Calculate, then export results as CSV or PDF.
| t | X | Y |
|---|---|---|
| 0 | 0.05 | 0.02 |
| 1 | 0.10 | 0.08 |
| 2 | 0.12 | 0.11 |
| 3 | 0.15 | 0.14 |
| 4 | 0.30 | 0.21 |
| 5 | 0.28 | 0.27 |
Try the default values already loaded in the form, then change k, l, tau, and bins to see sensitivity.
- Embedding choice: Larger
kandlcapture more history but reduce usable samples. - Discretization: Too many bins can inflate sparsity and noise; too few bins can hide structure.
- Interpretation: Transfer entropy reflects predictive information gain, not necessarily physical mechanism.
- Validation: Compare against shuffled surrogates to test significance.
Transfer Entropy in Practical Signal Analysis
Transfer entropy (TE) quantifies how much the past of one signal improves prediction of another signal’s future, beyond what the target’s own past explains. In experimental physics, this is useful when interactions are nonlinear or when correlation hides directionality. This calculator estimates TE using discretized probability counts from two synchronized time series.
1) What the output number means
A TE value near 0 indicates minimal predictive gain from the source series under your chosen settings. Positive values imply that including the source past reduces uncertainty in the target’s next sample. Results are reported in bits for base-2 logs and nats for natural logs.
2) Sample size and effective samples
The estimator uses an “effective” sample count after embedding. For example, with k=2, l=2, and tau=1, you lose one sample to construct the past vectors and one sample for y(t+1), so effective samples are roughly N−2. Larger k, l, or tau reduce effective samples quickly.
3) Choosing embedding lengths (k and l)
Start with k=1–3 and l=1–3 for most lab datasets, then increase gradually. If TE changes drastically with small embedding adjustments, your dataset may be short or too noisy. When k is too small, the target’s internal dynamics can leak into TE as a false interaction.
4) Selecting delay (tau)
Tau sets the spacing between samples used in the past vectors. If your system responds after a known latency, set tau to match that delay in samples. For high-rate measurements, try tau values like 1, 2, 5, or 10 and look for a consistent peak direction rather than a single best point.
5) Binning strategy and data density
Discretization controls bias and variance. With limited data, 6–12 bins often provides a stable compromise. As a rule of thumb, aim for at least 10–20 effective samples per frequently visited joint state. If you choose 30–50 bins on short series, many states become empty and TE becomes unstable.
6) Direction checks and symmetry
Compute both directions (X→Y and Y→X) and compare magnitudes. In coupled oscillators, for instance, a driven response often yields TE that is measurably higher in the driving direction. If both directions are similar, the coupling may be bidirectional, common-driven, or dominated by noise.
7) Practical validation with surrogate tests
To assess significance, repeat the calculation after shuffling the source series or applying a circular time shift. If the original TE exceeds the surrogate distribution by a clear margin (for example, above the 95th percentile), you have stronger evidence of directional information transfer rather than chance alignment.
1) What is transfer entropy used for?
It is used to detect directed information flow between two time series, especially when interactions are nonlinear and correlation is ambiguous.
2) Why do my results change when I increase bins?
More bins create more states, which needs more data. With sparse counts, probabilities become noisy, making TE fluctuate or inflate.
3) Should X and Y have the same length?
Yes, it is recommended. If lengths differ, align and trim them to the same time window to avoid mixing unrelated segments.
4) What do k and l represent?
k is the number of past samples of the target used for prediction, and l is the number of past samples of the source included as additional context.
5) What is a good starting point for tau?
Begin with tau=1 for evenly sampled data. If your system has a known response delay, set tau to match that delay in samples and test nearby values.
6) Is a higher TE always better?
Higher TE suggests stronger directional predictability gain, but it can be biased by poor binning, short data, or common drivers. Always compare with surrogates.
7) Can TE prove physical causality?
No. TE indicates predictive information transfer given your assumptions. Physical causality still requires experimental controls, theory, and checks for confounders.