Measure jailbreak signals, data leakage exposure, and harmful intent. Rank prompt risk quickly and consistently. Turn red flags into safer AI deployment decisions now.
Each factor is rated from 0 to 10. The weights total 100 points, so the base score is already normalized to a 100-point scale.
Multipliers then adjust risk for production sensitivity, external action capability, and control maturity. Strong guardrails reduce exposure, while broad tool access and critical deployments increase it.
| Scenario | Base Score | Multipliers | Final Score | Risk Band | Decision |
|---|---|---|---|---|---|
| FAQ assistant with no tools | 18.40 | 0.90 × 1.00 × 0.92 | 15.24 | Minimal | Proceed with normal monitoring |
| Internal analyst with retrieval | 41.60 | 1.00 × 1.05 × 1.00 | 43.68 | Elevated | Revise before wider deployment |
| Agent with code execution and broad actions | 69.10 | 1.20 × 1.20 × 1.08 | 100.00 | Critical | Do not deploy in current form |
This score estimates how risky a prompt appears before deployment. It combines threat indicators, environment sensitivity, tool reach, and control strength into one normalized value.
Use 0 for no visible risk signal and 10 for a very strong signal. Intermediate values work well when evidence is partial or uncertain.
Some indicators create broader downstream harm than others. Jailbreak strength, intent clarity, and tool abuse often matter more than stylistic deception alone.
Multipliers reflect context. A prompt becomes riskier when used in critical workflows, attached to powerful tools, or protected by weak controls.
No. It supports review, not certainty. Novel attacks, hidden context, or weak monitoring can still create problems even when the score looks low.
Blocking is sensible when the score is critical, when high-risk factors cluster together, or when mitigation steps are still missing or untested.
Yes. It helps compare attack prompts consistently, identify dominant failure drivers, and document which controls reduced exposure after retesting.
Start with individual scoring for sensitive prompts. Later, compare batches by exporting results and looking for recurring patterns across teams or use cases.
| Score Range | Band | Typical Response |
|---|---|---|
| 0.00 - 19.99 | Minimal | Keep standard monitoring and maintain prompt documentation. |
| 20.00 - 39.99 | Guarded | Improve wording, constrain outputs, and verify monitoring coverage. |
| 40.00 - 59.99 | Elevated | Require human review and revise risky instructions before rollout. |
| 60.00 - 79.99 | High | Escalate to security and policy reviewers, then add controls. |
| 80.00 - 100.00 | Critical | Block deployment until risks are reduced and retested. |
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.