Anomaly Detection Score Calculator

Calculator

CSV data (paste)

Use numeric columns for best results.

Or upload CSV (optional)

Upload overrides pasted data on submit.

Delimiter

Header row

Missing values

Treats empty, NA, NaN, null as missing.

Mode

Target column (name or 1-based index)

Required for univariate mode.

Scoring method

Distances excel for correlated features.

Standardize features (multivariate)

Improves stability when units differ.

Score scaling

Normalization affects threshold values.

Threshold mode

Contamination (%)

Flags the top percentage as anomalies.

Manual threshold

Example: Z ≈ 3, Modified Z ≈ 3.5.

Max rows to display

Does not affect exports.

Show only anomalies (on load)

Example data table

temperature	vibration	pressure	comment
68.2	0.21	101.3	Typical operating band
68.0	0.20	101.3	Typical operating band
72.9	0.58	103.9	Likely anomaly (spike)
64.2	0.11	99.6	Likely anomaly (drop)

You can paste the full sample dataset into the calculator above.

Formula used

Z-score magnitude (univariate)

z = (x − μ) / σ

score = |z|

Best when data is roughly symmetric and stable.

Modified Z using MAD (robust)

mz = 0.6745 · (x − median) / MAD

score = |mz|

Resists outliers and heavy tails.

IQR-based score (robust)

IQR = Q3 − Q1

score = max((x−Q3)/IQR, (Q1−x)/IQR, 0)

Highlights deviations beyond typical quartiles.

Mahalanobis distance (multivariate)

d = √((x − μ)ᵀ Σ⁻¹ (x − μ))

score = d

Captures correlations across multiple features.

Percentile threshold flags the top score fraction, which adapts to distribution shifts.

How to use this calculator

Paste your CSV (or upload a file). Use numeric columns.
Choose a mode: multivariate uses several columns; univariate targets one.
Pick a scoring method. Use distances for correlated sensor sets.
Choose thresholding: percentile for flexible alert rates; manual for fixed cutoffs.
Press Calculate Score to view anomalies and statistics.
Use CSV/PDF downloads to share scored results.

Data requirements for reliable scoring

Use at least 30 rows to stabilize mean, variance, and quantiles. Keep columns numeric and consistent units. If you score sensors, align sampling windows so each row represents the same interval. With missing values, dropping rows preserves purity, while mean imputation preserves volume. When more than 20% cells are missing, review upstream pipelines before trusting scores.

Robust univariate scoring with MAD and IQR

Z-score works well for near‑Gaussian signals, but it is sensitive to spikes. Modified Z uses median and MAD, reducing influence from extreme points. A common flag point is score ≥ 3.5. IQR scoring compares values to Q1 and Q3; values beyond the quartile band rise quickly, matching boxplot intuition. Scores above 1.5 often indicate strong outliers.

Mahalanobis distance for correlated features

When features move together, distance-based scoring captures joint behavior. Mahalanobis computes √((x−μ)ᵀΣ⁻¹(x−μ)), so a pressure–temperature pair that is individually normal can still be anomalous if their combination is rare. Standardizing features before building Σ prevents high-variance columns from dominating the distance. Use at least two informative features to avoid unstable covariance estimates.

Thresholding using contamination percentiles

Percentile thresholding treats the top k% scores as anomalies, where k is the contamination rate. Start with 1–5% for production systems, then validate with incidents and review false positives. Manual thresholds are useful for regulated processes, where fixed limits are required. Recompute percentile thresholds periodically when seasonality or drift changes the score distribution.

Interpreting normalized 0–100 scores

Min–max normalization maps the smallest score to 0 and the largest to 100 for easier dashboards. Use it for visualization, not for cross-dataset comparison, because the scale depends on current data range. For comparisons across time, store raw scores and the threshold value, then normalize within consistent windows. If the max score is a single extreme, consider clipping.

Operational use in monitoring workflows

Treat anomalies as candidates, not conclusions. Combine the score with context features such as device state, maintenance logs, and recent deployments. Track alert precision by sampling flagged rows weekly and measuring time-to-triage. If anomalies rise suddenly, check data quality first, then model settings. Export CSV and PDF to support audits and post‑incident analysis.

FAQs

1) Which method should I start with?

For one numeric column, start with Modified Z (MAD) for robustness. For multiple correlated numeric features, start with Mahalanobis distance with standardization, then validate flags against known incidents.

2) Why are most scores very similar?

If variance, MAD, or IQR is near zero, the signal is almost constant. Check units, rounding, and whether a column is categorical text. Add more rows or a more informative feature.

3) How do I choose contamination?

Use a small starting value, typically 1–5%, then compare alerts with labels or expert review. If you see too many false alarms, decrease contamination. If you miss incidents, increase it gradually.

4) Does normalization change anomaly decisions?

No, normalization is applied before thresholding only when selected and then thresholding uses the scaled values. If you keep the same threshold mode, the ranking is preserved, but manual cutoffs should be reconsidered after scaling.

5) Can I score categorical columns?

This calculator focuses on numeric features. Convert categories into numeric indicators, counts, or embeddings first. Alternatively, score derived metrics like frequency, error rate, or time-since-last-seen per category.

6) Why does Mahalanobis sometimes fail?

If columns are perfectly correlated or you have too few rows, the covariance matrix becomes singular. Enable standardization, remove redundant columns, or add more data so the covariance can be inverted reliably.

Calculator

Example data table

Formula used

How to use this calculator

Data requirements for reliable scoring

Robust univariate scoring with MAD and IQR

Mahalanobis distance for correlated features

Thresholding using contamination percentiles

Interpreting normalized 0–100 scores

Operational use in monitoring workflows

FAQs

Related Calculators