Behavior Anomaly Score Calculator

Enter behavior and baseline data

Provide mean and deviation from your baseline, then the current observation.

Actor label

Observation window (hours)

Used for labeling, not for math.

Notes (optional)

Baseline distribution and current count.

Mean

Std

Current

Session duration (minutes)

Typical session length versus current session.

Mean

Std

Current

Data transfer per session (MB)

Useful for spotting unusual exports and bulk reads.

Mean

Std

Current

Failed sign-in attempts (window)

Elevations can indicate guessing or session abuse.

Mean

Std

Current

Privileged actions (window)

Admin actions, policy changes, role grants, or secrets access.

Mean

Std

Current

Context signals

Binary and categorical factors that commonly drive investigations.

New device

New geography

MFA challenge failed

Unusual time-of-day

Threat-intel match

Account sensitivity

Advanced scoring options

Weights are auto-normalized. Use these to match your detection priorities.

Weight: logins

Weight: sessions

Weight: data

Weight: failed sign-ins

Weight: privileged actions

Z cap

Limits extreme values.

Z scale

Z at which anomaly reaches 1.

Epsilon for Std

Prevents divide-by-zero.

Logistic steepness (k)

Logistic midpoint

Higher shifts scores downward.

Reset

Example data table

These sample scenarios show how scores shift with behavior changes and context signals.

Scenario	Score	Sensitivity	New device	New geo	Threat intel	Data current (MB)	Failed current
Baseline-like	7.0	Medium	No	No	None	130	1
New device + new geo	41.9	Medium	Yes	Yes	None	140	1
Data spike	31.2	High	No	No	Low	420	1
Brute-force pattern	93.6	High	Yes	No	Medium	110	9
Intel hit on critical account	99.9	Critical	Yes	Yes	High	260	3

Tip: start with conservative weights, then tune using confirmed incidents and false positives.

Formula used

Each numeric metric is transformed into a capped z-score and normalized to an anomaly value between 0 and 1.

z = min( |x - μ| / max(σ, ε), z_cap )
a = min( z / z_scale, 1 )

numeric_sum = Σ ( w_i * a_i )   // weights are normalized to sum to 1
context_sum = new_device + new_geo + mfa_failed + time_of_day + intel_match

base = (numeric_sum + context_sum) * sensitivity_multiplier
score = 100 / ( 1 + e^( -k * (base - midpoint) ) )

Interpretation: higher deviation, stronger context, and higher sensitivity increase the score.

How to use this calculator

Collect baseline mean and deviation for each metric.
Enter the current observation from your alert or log set.
Select context signals such as new device or intel match.
Set account sensitivity to reflect business impact.
Calculate, then review drivers and suggested actions.
Export CSV or PDF to attach to a case record.

Professional notes to help tune scoring and interpret results in operational environments.

Baselining telemetry for reliable drift detection

Build baselines from stable periods and consistent log sources. A practical window is 14–30 days, refreshed weekly, with at least 200 events per metric to reduce noise. Use separate baselines for weekdays versus weekends when usage patterns differ. If a standard deviation is near zero, treat the signal as deterministic and raise epsilon slightly. Store μ and σ per actor type to avoid mixing human, service, and host behavior. Consider per-application baselines for VPN, email, and admin portals.

Weight tuning to reflect control objectives

Weights are normalized so they always sum to 1, making tuning predictable across teams. Start with balanced weights, then increase data-transfer and privileged-action weights when your primary risk is exfiltration or policy tampering. For identity-focused programs, raise failed sign-ins and login frequency weights. Keep any single weight below 0.40 to prevent one metric dominating. Recalibrate quarterly using confirmed incidents and false-positive reviews. Document each change in a tuning log for audits.

Context signals that materially raise investigation priority

Binary context flags add direct risk when behavior shifts align with attacker playbooks. New device and new geography each add 0.15, and a failed challenge adds 0.20, reflecting higher compromise likelihood. Time-of-day and intel matches contribute 0.10–0.40 based on confidence. Use “Critical” sensitivity when the actor can access regulated data or production controls, applying a multiplier up to 1.60 to amplify the same anomaly evidence. Validate flags with device and geo inventories.

Thresholding and score bands for triage workflows

The score uses a logistic curve to compress raw evidence into 0–100. With k around 3.2 and midpoint near 1.0, small drift stays low while clustered signals rise quickly. Suggested bands are Low <30, Medium 30–59, High 60–79, and Critical ≥80. Pair bands with playbooks: Medium triggers validation, High triggers containment checks, and Critical starts incident response. Adjust thresholds to meet alert volume targets.

Export-ready reporting for case management

Operational teams need repeatable evidence, not just a number. The export includes metric means, deviations, current values, z-scores, normalized anomaly values, and weighted impacts, so reviewers can trace why the score changed. Attach the CSV to tickets for trend analysis, and share the PDF for executive updates. Capture notes like alert IDs and log references to support chain-of-custody and faster peer review.

FAQs

What does the anomaly score represent?

It summarizes how far current behavior deviates from baseline, plus context signals, then scales it to 0–100. Higher scores indicate stronger evidence and higher potential impact, not confirmed compromise.

How should I choose baseline mean and deviation?

Compute μ and σ from clean periods for the same actor type and workload. Use 14–30 days where possible, exclude incident windows, and split baselines by weekday or region if patterns differ.

What if I have limited historical data?

Start with short baselines and conservative thresholds, then tighten as telemetry grows. You can also borrow cohort baselines, such as team-level or role-level averages, but label them clearly to avoid overconfidence.

Why are the weights normalized automatically?

Normalization keeps weights comparable and prevents totals from inflating scores. You can tune priorities without recalculating sums, and reviewers can interpret each metric’s impact as a percentage of numeric evidence.

How do sensitivity and threat intel affect results?

Sensitivity multiplies combined evidence to reflect business risk, while intel adds a direct confidence-based boost. Together they raise urgency for high-impact accounts and known-bad indicators, even when numeric drift is moderate.

Can I use the exports in case workflows?

Yes. CSV supports analysis and trend reviews, and PDF supports sharing. Include notes like ticket IDs, log queries, and timestamps so responders can reproduce the evidence and document decisions.

Enter behavior and baseline data

Example data table

Formula used

How to use this calculator

Baselining telemetry for reliable drift detection

Weight tuning to reflect control objectives

Context signals that materially raise investigation priority

Thresholding and score bands for triage workflows

Export-ready reporting for case management

FAQs

What does the anomaly score represent?

How should I choose baseline mean and deviation?

What if I have limited historical data?

Why are the weights normalized automatically?

How do sensitivity and threat intel affect results?

Can I use the exports in case workflows?

Related Calculators