Measure multivariate similarity even when variables correlate. Tune confidence, regularize covariance, and compare groups quickly. Download clean tables and share consistent statistical decisions today.
Paste your dataset (rows = samples, columns = variables). Then paste one or more observations to score. This tool estimates μ and Σ, computes D = √((x−μ)ᵀ Σ⁻¹ (x−μ)), and flags outliers using a chi-square cutoff.
| Var1 | Var2 | Var3 |
|---|---|---|
| 10 | 20 | 30 |
| 12 | 19 | 29 |
| 9 | 21 | 31 |
| 11 | 18 | 28 |
| 13 | 22 | 33 |
Tip: Keep variables in similar measurement units for interpretability.
Mahalanobis distance converts several measurements into one scale that accounts for correlation and variance. If two variables move together, the metric avoids double‑counting their shared information. This is useful in statistics, fraud screening, process control, and multivariate matching, where Euclidean distance can mis-rank points when features are correlated. For example, in a three-variable profile, a point that is two standard deviations high on two strongly correlated features may be typical, while the same deviations on independent features can be unusual. Because the distance uses Σ, it naturally rescales units, but you should still avoid mixing raw counts with ratios unless that reflects your domain. When features are highly skewed, consider transforming them before analysis. and document every transformation used.
Distances are only as trustworthy as the reference dataset. With p variables, the covariance matrix has p(p+1)/2 unique terms, so more rows stabilizes estimates. As a practical rule, collect at least 10–20×p observations when possible. Standardize data cleaning: remove obvious entry errors, treat missing values consistently, and keep each row comparable in time window and measurement method.
The calculator estimates the mean vector μ and covariance Σ using either sample (n−1) or population (n) scaling. When Σ is nearly singular, inversion becomes unstable and distances explode. Shrinkage solves this by blending Σ with a scaled identity matrix: Σ′=(1−λ)Σ+λ·(tr(Σ)/p)I. A small λ (for example 0.01–0.10) often improves numerical stability while preserving structure.
The output includes D and D², where D²=(x−μ)ᵀΣ⁻¹(x−μ). Under approximate multivariate normality, D² follows a chi-square distribution with df=p. The tool converts D² to a p-value and flags outliers when D² exceeds χ²(df=p, confidence=1−α). Use α to tune sensitivity: α=0.05 flags about 5% of points in a well-behaved reference set.
In monitoring, compute distances for each new record against a rolling baseline (for example the last 30 days), then chart the rate of flagged observations by hour or region. In quality control, spikes can indicate sensor drift or batch changes. Export CSV for downstream review and export PDF to document settings, thresholds, and dataset definitions for reproducible decisions.
Provide a reference dataset with numeric columns and one or more observation rows to score. The observation must have the same number of values as the dataset columns.
Covariance estimation needs enough observations to produce a full-rank matrix. With too few rows, Σ cannot be inverted reliably, making distances unstable or undefined.
Increase λ when you see singular-matrix errors, extreme distances, or highly correlated variables. Shrinkage improves invertibility and reduces sensitivity to sampling noise, especially with small datasets.
It is the upper-tail probability of D² under a chi-square model with df equal to the number of variables. Smaller values suggest the observation is less consistent with the reference distribution.
No. The flag indicates statistical unusualness, not intent or error. Investigate context, segment effects, and measurement issues before acting, and consider separate baselines for different populations.
Diagonal mode ignores correlations and uses only per-variable variances. It is simpler and more stable, but can mis-rank points when variables are strongly correlated.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.