Formula used
Each criterion is rated from 0 to 5 and scaled by its weight.
- Ratings: your slider values per criterion.
- Weights: importance multipliers that sum to 100.
- Tier: qualitative label based on the final score.
How to use this calculator
- Paste your prompt into the text box.
- Optionally click Auto-fill from prompt to set sliders.
- Review sliders and adjust any criterion you care about.
- Press Calculate Score to see results above the form.
- Download CSV for tracking, or PDF for sharing.
Example data table
| Prompt snippet | What is specified | Expected score range |
|---|---|---|
| “Summarize this article.” | Goal only; missing format, constraints, and audience. | 15–35 |
| “Summarize for executives in five bullets, max 90 words.” | Goal + audience + format + constraint; still lacks evaluation and examples. | 55–75 |
| “Act as an analyst. Use the table below. Output JSON with fields… Include edge cases…” | Role, inputs, structured output, constraints, and exceptions are defined. | 80–95 |
Professional notes
Why prompt completeness matters
Incomplete prompts increase rework because models must guess missing goals, context, or output structure. A completeness score provides a repeatable signal of specification quality, helping teams compare prompts across use cases. Higher completeness typically reduces hallucination risk, improves consistency, and shortens iteration cycles by making expectations explicit.
Core dimensions behind the score
This calculator rates ten criteria from 0–5, then applies weights that sum to 100. Objective clarity and output format carry higher influence because they anchor what to do and how to return it. Context, constraints, and inputs/data ensure the model has the necessary facts and boundaries. Examples, tone, audience, evaluation criteria, and edge cases refine behavior in realistic situations.
Interpreting tiers and tradeoffs
Scores near 85–100 indicate strong specification with minimal ambiguity, while 70–84 suggests a usable prompt missing a few tightening details. Mid‑range scores often show gaps in constraints or evaluation criteria, leading to variable outputs. Very low scores usually lack a clear deliverable or format. Remember that “complete” is not “long”; concise prompts can score well when structured.
Improvement tactics that raise scores
Start by stating the task, the success definition, and the exact output schema. Add constraints such as length limits, prohibited content, and required citations or sources. Provide the essential inputs, including tables or assumptions, and name any tools the model may use. Include a small example pair when the format is complex. Finally, call out edge cases like missing values or conflicting requirements.
Operationalizing scoring in teams
Use the score as a pre‑review gate before prompts enter production. Track scores over time with the CSV export and attach PDF reports in reviews. Set minimum targets by prompt class, for example 70 for internal drafts and 80 for customer‑facing automation. When scores drop, inspect the “weak areas” list and update the prompt template to prevent regressions. In workshops, score several prompts against the same rubric to calibrate ratings. If auto-detect is enabled, keyword signals can gently boost ratings, but human review should confirm intent. Pair scoring with A/B evaluation metrics such as task success rate, defect counts, and latency in practice.
FAQs
1) What does a high completeness score indicate?
It indicates your prompt clearly defines the objective, provides sufficient context and constraints, and specifies the output format. Higher scores usually correlate with more consistent responses and fewer clarification questions from the model.
2) Should every prompt target a perfect score?
No. Some tasks benefit from exploration. Aim for the minimum completeness that produces stable results: clear goal and output, plus constraints that matter. Over-specifying can reduce creativity or add unnecessary maintenance.
3) How should I rate the Examples criterion?
Rate higher when you include at least one representative input and the exact expected output shape. Give extra credit when the example covers formatting details, edge conditions, or common mistakes the model should avoid.
4) My prompt is short. Can it still score well?
Yes. Brevity is fine if you include structure: a clear task statement, required output format, and key constraints. Short prompts often score lower only when they omit context or success criteria entirely.
5) Does auto-detect replace manual review?
No. Auto-detect only looks for keyword signals and provides gentle boosts. You should still verify intent, data requirements, and edge cases, especially for production workflows where small ambiguities can create large failures.
6) How can I use the CSV and PDF exports effectively?
Use CSV to track scores across versions, owners, and use cases, then spot trends. Use PDF for reviews and approvals, because it packages the score, tier, ratings, and weak-area list into one shareable snapshot.