Enter behavior and baseline data
Example data table
| Scenario | Score | Sensitivity | New device | New geo | Threat intel | Data current (MB) | Failed current |
|---|---|---|---|---|---|---|---|
| Baseline-like | 7.0 | Medium | No | No | None | 130 | 1 |
| New device + new geo | 41.9 | Medium | Yes | Yes | None | 140 | 1 |
| Data spike | 31.2 | High | No | No | Low | 420 | 1 |
| Brute-force pattern | 93.6 | High | Yes | No | Medium | 110 | 9 |
| Intel hit on critical account | 99.9 | Critical | Yes | Yes | High | 260 | 3 |
Formula used
Each numeric metric is transformed into a capped z-score and normalized to an anomaly value between 0 and 1.
z = min( |x - μ| / max(σ, ε), z_cap )
a = min( z / z_scale, 1 )
numeric_sum = Σ ( w_i * a_i ) // weights are normalized to sum to 1
context_sum = new_device + new_geo + mfa_failed + time_of_day + intel_match
base = (numeric_sum + context_sum) * sensitivity_multiplier
score = 100 / ( 1 + e^( -k * (base - midpoint) ) )
How to use this calculator
- Collect baseline mean and deviation for each metric.
- Enter the current observation from your alert or log set.
- Select context signals such as new device or intel match.
- Set account sensitivity to reflect business impact.
- Calculate, then review drivers and suggested actions.
- Export CSV or PDF to attach to a case record.
Baselining telemetry for reliable drift detection
Build baselines from stable periods and consistent log sources. A practical window is 14–30 days, refreshed weekly, with at least 200 events per metric to reduce noise. Use separate baselines for weekdays versus weekends when usage patterns differ. If a standard deviation is near zero, treat the signal as deterministic and raise epsilon slightly. Store μ and σ per actor type to avoid mixing human, service, and host behavior. Consider per-application baselines for VPN, email, and admin portals.
Weight tuning to reflect control objectives
Weights are normalized so they always sum to 1, making tuning predictable across teams. Start with balanced weights, then increase data-transfer and privileged-action weights when your primary risk is exfiltration or policy tampering. For identity-focused programs, raise failed sign-ins and login frequency weights. Keep any single weight below 0.40 to prevent one metric dominating. Recalibrate quarterly using confirmed incidents and false-positive reviews. Document each change in a tuning log for audits.
Context signals that materially raise investigation priority
Binary context flags add direct risk when behavior shifts align with attacker playbooks. New device and new geography each add 0.15, and a failed challenge adds 0.20, reflecting higher compromise likelihood. Time-of-day and intel matches contribute 0.10–0.40 based on confidence. Use “Critical” sensitivity when the actor can access regulated data or production controls, applying a multiplier up to 1.60 to amplify the same anomaly evidence. Validate flags with device and geo inventories.
Thresholding and score bands for triage workflows
The score uses a logistic curve to compress raw evidence into 0–100. With k around 3.2 and midpoint near 1.0, small drift stays low while clustered signals rise quickly. Suggested bands are Low <30, Medium 30–59, High 60–79, and Critical ≥80. Pair bands with playbooks: Medium triggers validation, High triggers containment checks, and Critical starts incident response. Adjust thresholds to meet alert volume targets.
Export-ready reporting for case management
Operational teams need repeatable evidence, not just a number. The export includes metric means, deviations, current values, z-scores, normalized anomaly values, and weighted impacts, so reviewers can trace why the score changed. Attach the CSV to tickets for trend analysis, and share the PDF for executive updates. Capture notes like alert IDs and log references to support chain-of-custody and faster peer review.
FAQs
What does the anomaly score represent?
It summarizes how far current behavior deviates from baseline, plus context signals, then scales it to 0–100. Higher scores indicate stronger evidence and higher potential impact, not confirmed compromise.
How should I choose baseline mean and deviation?
Compute μ and σ from clean periods for the same actor type and workload. Use 14–30 days where possible, exclude incident windows, and split baselines by weekday or region if patterns differ.
What if I have limited historical data?
Start with short baselines and conservative thresholds, then tighten as telemetry grows. You can also borrow cohort baselines, such as team-level or role-level averages, but label them clearly to avoid overconfidence.
Why are the weights normalized automatically?
Normalization keeps weights comparable and prevents totals from inflating scores. You can tune priorities without recalculating sums, and reviewers can interpret each metric’s impact as a percentage of numeric evidence.
How do sensitivity and threat intel affect results?
Sensitivity multiplies combined evidence to reflect business risk, while intel adds a direct confidence-based boost. Together they raise urgency for high-impact accounts and known-bad indicators, even when numeric drift is moderate.
Can I use the exports in case workflows?
Yes. CSV supports analysis and trend reviews, and PDF supports sharing. Include notes like ticket IDs, log queries, and timestamps so responders can reproduce the evidence and document decisions.