Calculator Inputs
Use the main cutoff fields for the primary metric. Use the comma-separated curve fields to compare recall across multiple cutoffs.
Example Data Table
This sample shows how recall typically improves as the retriever searches deeper into the ranking.
| Cutoff K | Relevant Found | Total Relevant | Recall % | Interpretation |
|---|---|---|---|---|
| 1 | 18 | 120 | 15.00% | Only the top result is checked. |
| 3 | 39 | 120 | 32.50% | Coverage improves with a slightly deeper ranked list. |
| 5 | 57 | 120 | 47.50% | Top-five retrieval captures nearly half the relevant set. |
| 10 | 83 | 120 | 69.17% | Useful checkpoint for many RAG pipelines. |
| 20 | 102 | 120 | 85.00% | Broader search improves coverage but may add cost. |
Formula Used
Recall@K
Recall@K = Relevant Retrieved in Top K ÷ Total Relevant Documents
Hit Rate@K
Hit Rate@K = Queries with At Least One Relevant Result ÷ Total Queries
Miss Rate@K
Miss Rate@K = 1 − Hit Rate@K
Precision@K
Precision@K = Relevant Retrieved in Top K ÷ Retrieved Documents at K
F1 Score
F1 = 2 × Precision × Recall ÷ (Precision + Recall)
Coverage Gap
Coverage Gap = Unretrieved Relevant Documents ÷ Total Relevant Documents
Recall is the main retrieval coverage signal. It is especially useful in AI systems where missing a relevant document harms downstream answer quality.
How to Use This Calculator
- Enter the number of evaluated queries in your offline test set.
- Enter the total number of judged relevant documents for that set.
- Enter the main cutoff K and the number of relevant documents found within that cutoff.
- Add how many queries returned at least one relevant hit to compute hit rate.
- Set a recall target if you want a benchmark comparison.
- Optionally enter multiple cutoffs and relevant-found values to draw a recall curve.
- Press Calculate Recall to show the result above the form.
- Use the CSV or PDF buttons to export summary metrics and cutoff-level performance.
Frequently Asked Questions
1. What does retriever recall measure?
Retriever recall measures how much of the known relevant set your system actually surfaces within a chosen cutoff. It focuses on coverage, not just ranking quality.
2. Why can recall@20 be higher than recall@5?
A larger cutoff lets the retriever inspect more ranked documents. That usually uncovers more relevant items, so recall often rises as K increases.
3. Is recall enough to judge retriever quality?
No. Recall is essential, but it should be paired with precision, hit rate, latency, and downstream answer quality. High recall alone can still produce noisy retrieval.
4. What is a good recall target for RAG?
A good target depends on domain risk, ranking quality, and context budget. Many teams start near 70% to 90%, then tune around answer quality and cost.
5. What is the difference between hit rate and recall?
Hit rate checks whether at least one relevant result appears for a query. Recall measures how many of all relevant documents were surfaced overall.
6. Why does this calculator include precision and F1?
They add balance. Precision shows how much retrieved content is actually relevant, while F1 helps summarize precision and recall in one number.
7. Can I compare several cutoffs at once?
Yes. Enter comma-separated cutoffs and matching relevant-found counts. The chart and table will show how recall changes as the ranked depth increases.
8. When is low recall especially dangerous?
Low recall is dangerous when missing a relevant document breaks the downstream task, such as legal search, medical retrieval, policy lookup, or enterprise RAG.
Why This Metric Matters in AI & Machine Learning
Retriever recall is a core offline evaluation signal for semantic search, vector databases, hybrid search, and retrieval-augmented generation. If your retriever misses relevant documents, the reranker and language model cannot recover that missing evidence later. Improving recall often lifts answer grounding, reduces hallucinations, and makes evaluation more stable.