Prompt Safety Risk Score Calculator

Calculator Inputs

Assessment Name

Reviewer

Model or Workflow

Deployment Sensitivity

Tool Access

Guardrail Strength

Harmful Intent Clarity

Score from 0 to 10. Weight: 12

Jailbreak Pattern Strength

Score from 0 to 10. Weight: 14

Policy Evasion Signals

Score from 0 to 10. Weight: 10

Sensitive Data Exposure Risk

Score from 0 to 10. Weight: 10

PII Presence Likelihood

Score from 0 to 10. Weight: 8

Tool Abuse Potential

Score from 0 to 10. Weight: 10

Code or Command Execution Risk

Score from 0 to 10. Weight: 8

Privilege Escalation Pressure

Score from 0 to 10. Weight: 7

Obfuscation or Encoding Signals

Score from 0 to 10. Weight: 6

Roleplay or Deception Framing

Score from 0 to 10. Weight: 5

Cross-Language Evasion Risk

Score from 0 to 10. Weight: 4

Real-World Impact Severity

Score from 0 to 10. Weight: 6

Prompt Excerpt

Review Notes

Formula Used

Base Weighted Score
Base Score = Σ[(Factor Score ÷ 10) × Factor Weight]

Adjusted Prompt Safety Risk Score
Final Score = min[100, Base Score × Sensitivity Multiplier × Tool Access Multiplier × Guardrail Multiplier]

Each factor is rated from 0 to 10. The weights total 100 points, so the base score is already normalized to a 100-point scale.

Multipliers then adjust risk for production sensitivity, external action capability, and control maturity. Strong guardrails reduce exposure, while broad tool access and critical deployments increase it.

How to Use This Calculator

Enter an assessment name, reviewer, and the target model or workflow.
Choose the deployment sensitivity, tool access level, and guardrail strength.
Rate each safety factor from 0 to 10 based on the prompt being reviewed.
Add the prompt excerpt and any notes that explain assumptions or concerns.
Press Calculate Risk Score to show results below the header.
Review the score, risk band, priority drivers, and recommended deployment decision.
Use the CSV or PDF buttons to export the full assessment summary.

Example Data Table

Scenario	Base Score	Multipliers	Final Score	Risk Band	Decision
FAQ assistant with no tools	18.40	0.90 × 1.00 × 0.92	15.24	Minimal	Proceed with normal monitoring
Internal analyst with retrieval	41.60	1.00 × 1.05 × 1.00	43.68	Elevated	Revise before wider deployment
Agent with code execution and broad actions	69.10	1.20 × 1.20 × 1.08	100.00	Critical	Do not deploy in current form

Frequently Asked Questions

1. What does this score measure?

This score estimates how risky a prompt appears before deployment. It combines threat indicators, environment sensitivity, tool reach, and control strength into one normalized value.

2. What scale should I use for each factor?

Use 0 for no visible risk signal and 10 for a very strong signal. Intermediate values work well when evidence is partial or uncertain.

3. Why do the weights differ between factors?

Some indicators create broader downstream harm than others. Jailbreak strength, intent clarity, and tool abuse often matter more than stylistic deception alone.

4. Why can a moderate base score become high?

Multipliers reflect context. A prompt becomes riskier when used in critical workflows, attached to powerful tools, or protected by weak controls.

5. Does a low score guarantee safety?

No. It supports review, not certainty. Novel attacks, hidden context, or weak monitoring can still create problems even when the score looks low.

6. When should I block deployment?

Blocking is sensible when the score is critical, when high-risk factors cluster together, or when mitigation steps are still missing or untested.

7. Can this be used for red-team exercises?

Yes. It helps compare attack prompts consistently, identify dominant failure drivers, and document which controls reduced exposure after retesting.

8. Should I score prompts individually or in batches?

Start with individual scoring for sensitive prompts. Later, compare batches by exporting results and looking for recurring patterns across teams or use cases.

Interpretation Guide

Score Range	Band	Typical Response
0.00 - 19.99	Minimal	Keep standard monitoring and maintain prompt documentation.
20.00 - 39.99	Guarded	Improve wording, constrain outputs, and verify monitoring coverage.
40.00 - 59.99	Elevated	Require human review and revise risky instructions before rollout.
60.00 - 79.99	High	Escalate to security and policy reviewers, then add controls.
80.00 - 100.00	Critical	Block deployment until risks are reduced and retested.

Calculator Inputs

Formula Used

How to Use This Calculator

Example Data Table

Frequently Asked Questions

1. What does this score measure?

2. What scale should I use for each factor?

3. Why do the weights differ between factors?

4. Why can a moderate base score become high?

5. Does a low score guarantee safety?

6. When should I block deployment?

7. Can this be used for red-team exercises?

8. Should I score prompts individually or in batches?

Interpretation Guide

Related Calculators