False Nearest Neighbors Calculator

Calculator Inputs

Paste a scalar time-series. Values can be separated by commas, spaces, or new lines.

Time-Series Data

For large datasets, increase “Max points” or paste fewer values for faster calculation.

Time Delay τ (samples)

Max Embedding Dimension

Theiler Window W

Exclude neighbors within ±W indices.

Rtol (relative threshold)

Used by criterion: Δ / Rm > Rtol.

Atol (absolute threshold)

Used by criterion: Rm+1 / σ > Atol.

Max Points (performance)

Caps analysis length to avoid slow runs.

Use relative criterion (Δ / Rm)

Use absolute criterion (Rm+1 / σ)

Normalize series (z-score)

Formula Used

Build delay-embedded vectors of dimension m: X_i^(m) = [x_i, x_i+τ, …, x_i+(m−1)τ].

For each X_i^(m), find the nearest neighbor X_j^(m) (excluding indices within ±W). The m-dimensional distance is: R_m = ||X_i^(m) − X_j^(m)||.

When moving to m+1, compute the added separation Δ = |x_i+mτ − x_j+mτ| and R_m+1 = √(R_m² + Δ²). A neighbor is “false” if Δ/R_m > Rtol and/or R_m+1/σ > Atol.

How to Use This Calculator

Paste a single-column time-series in the data box.
Choose a delay τ. Start with 1–5 samples.
Set a maximum dimension to scan, such as 8–15.
Optionally set a Theiler window to avoid temporal neighbors.
Pick one or both criteria and press Calculate.
Look for the smallest m where FNN% stays low.
Use CSV or PDF export for reporting and plotting.

Example Data Table

Example series: Logistic map (r=4), 300 points after a short discard. Your results may differ if you change τ, thresholds, or window size.

Index	x
1	0.950063
2	0.189772
3	0.615035
4	0.947067
5	0.200523
6	0.641254
7	0.920189
8	0.293764
9	0.829868
10	0.564750

Sample output (typical settings):

Using τ=2, W=10, Rtol=10, Atol=2 with normalization.

m	FNN (%)
1	26.51
2	12.84
3	12.24
4	11.30
5	14.83
6	19.79

False Nearest Neighbors Guide

1) What False Nearest Neighbors Measures

False nearest neighbors (FNN) estimates the smallest embedding dimension that unfolds a hidden state space from a single measured signal. In too-small dimensions, trajectories overlap and points appear close only because the projection is cramped. FNN counts how often a “nearest” point separates sharply after adding one more coordinate.

2) Delay Embedding for Experimental Signals

The calculator builds delay vectors using a time delay τ and dimension m. Each vector stacks samples from the same record, producing a geometric representation of the dynamics. This approach is widely used for nonlinear oscillators, plasma fluctuations, structural vibrations, EEG, and turbulence proxies, where only one sensor channel is available.

3) Choosing the Time Delay τ

A practical τ avoids near-duplicate coordinates while keeping dynamical coupling. Common selections come from the first minimum of mutual information or where autocorrelation drops to about 1/e. For many sampled lab signals, starting values of τ = 1 to 10 samples provide a useful scan before finer tuning.

4) Selecting the Theiler Window W

Time-series points close in index are often trivially close in value, which can bias neighbor searches. The Theiler window excludes neighbors within ±W indices. A rule of thumb is W near one dominant period in samples, or a few correlation times, especially for oscillatory measurements.

5) Understanding Rtol and Atol

Two common “false” tests are offered. The relative test checks Δ/Rm, so it is sensitive to sudden separation after increasing dimension. The absolute test checks Rm+1/σ, which scales distances by the series standard deviation. Typical starting values are Rtol = 10 and Atol = 2, then adjust if noise or amplitude scaling changes the distances.

6) Data Length and Noise Considerations

Reliable FNN curves need enough points to populate the reconstructed space. As a baseline, aim for at least 1000 samples, and more if you scan beyond m = 10. Measurement noise inflates distances and can prevent FNN% from reaching zero; normalization helps, and moderate smoothing or band-limited preprocessing can stabilize neighbor selection.

7) Reading the Output Table

For each m, the table reports the percent of false neighbors among valid neighbor pairs. The recommended embedding dimension is often the smallest m where FNN% drops near zero and remains low as m increases. Use the “Valid pairs” column to confirm sufficient statistics.

8) Typical Workflows and Reporting

Start with normalization enabled, scan m = 1 to 12, and try a small set of delays (for example τ = 2, 4, 6). Compare the FNN curves and select the most stable elbow. Export CSV for plotting FNN% versus m, and export PDF for lab notes or appendix-ready documentation.

FAQs

1) What result indicates a good embedding dimension?

Choose the smallest dimension where false neighbors drop close to zero and stay low for higher dimensions. Stability across nearby τ values is a strong sign.

2) Why do I see a nonzero FNN% even at high dimensions?

Noise, short datasets, or strong nonstationarity can keep neighbors unstable. Increase data length, use normalization, consider filtering, and avoid overly large τ or insufficient Theiler windows.

3) Should I enable both criteria?

Using both is common for robust screening. If your signal is very noisy, the absolute test can be stricter. If distances are tiny, the relative test can dominate.

4) How many data points do I need?

More is better. A practical minimum is about 1,000 points for basic scans. For m above 10 or sparse dynamics, several thousand points improves neighbor statistics.

5) What is the Theiler window doing?

It prevents choosing temporally adjacent points as neighbors. Those neighbors are often close only because of sampling continuity, not because the dynamics are genuinely nearby.

6) What does “Avg neighbor distance” help with?

It gives scale context for neighbor spacing at each dimension. Large jumps can hint at noise sensitivity or poor τ choices, especially when valid-pair counts also drop.

7) Can I use this for multivariate data?

This tool assumes a single scalar series. For multivariate measurements, you can analyze one channel, or build custom embeddings that combine channels, then apply similar neighbor tests.