Retriever Recall Calculator for Search Evaluation

Calculator Inputs

Use the main cutoff fields for the primary metric. Use the comma-separated curve fields to compare recall across multiple cutoffs.

Total Queries

Number of evaluated queries in the test set.

Total Relevant Documents

Known relevant documents across the judged evaluation set.

Retrieved Documents at K

Usually equal to K for one document list per query.

Relevant Documents Retrieved at K

Relevant items surfaced within the main cutoff.

Queries with At Least One Hit

Used to compute hit rate and miss rate.

Primary Cutoff K

Main cutoff used for the headline recall result.

Target Recall (%)

Benchmark used to assess target attainment.

Cutoff List

Comma-separated list such as 1,3,5,10,20.

Relevant Found List

Comma-separated counts matching the cutoff list order.

Reset

Example Data Table

This sample shows how recall typically improves as the retriever searches deeper into the ranking.

Cutoff K	Relevant Found	Total Relevant	Recall %	Interpretation
1	18	120	15.00%	Only the top result is checked.
3	39	120	32.50%	Coverage improves with a slightly deeper ranked list.
5	57	120	47.50%	Top-five retrieval captures nearly half the relevant set.
10	83	120	69.17%	Useful checkpoint for many RAG pipelines.
20	102	120	85.00%	Broader search improves coverage but may add cost.

Formula Used

Recall@K

Recall@K = Relevant Retrieved in Top K ÷ Total Relevant Documents

Hit Rate@K

Hit Rate@K = Queries with At Least One Relevant Result ÷ Total Queries

Miss Rate@K

Miss Rate@K = 1 − Hit Rate@K

Precision@K

Precision@K = Relevant Retrieved in Top K ÷ Retrieved Documents at K

F1 Score

F1 = 2 × Precision × Recall ÷ (Precision + Recall)

Coverage Gap

Coverage Gap = Unretrieved Relevant Documents ÷ Total Relevant Documents

Recall is the main retrieval coverage signal. It is especially useful in AI systems where missing a relevant document harms downstream answer quality.

How to Use This Calculator

Enter the number of evaluated queries in your offline test set.
Enter the total number of judged relevant documents for that set.
Enter the main cutoff K and the number of relevant documents found within that cutoff.
Add how many queries returned at least one relevant hit to compute hit rate.
Set a recall target if you want a benchmark comparison.
Optionally enter multiple cutoffs and relevant-found values to draw a recall curve.
Press Calculate Recall to show the result above the form.
Use the CSV or PDF buttons to export summary metrics and cutoff-level performance.

Frequently Asked Questions

1. What does retriever recall measure?

Retriever recall measures how much of the known relevant set your system actually surfaces within a chosen cutoff. It focuses on coverage, not just ranking quality.

2. Why can recall@20 be higher than recall@5?

A larger cutoff lets the retriever inspect more ranked documents. That usually uncovers more relevant items, so recall often rises as K increases.

3. Is recall enough to judge retriever quality?

No. Recall is essential, but it should be paired with precision, hit rate, latency, and downstream answer quality. High recall alone can still produce noisy retrieval.

4. What is a good recall target for RAG?

A good target depends on domain risk, ranking quality, and context budget. Many teams start near 70% to 90%, then tune around answer quality and cost.

5. What is the difference between hit rate and recall?

Hit rate checks whether at least one relevant result appears for a query. Recall measures how many of all relevant documents were surfaced overall.

6. Why does this calculator include precision and F1?

They add balance. Precision shows how much retrieved content is actually relevant, while F1 helps summarize precision and recall in one number.

7. Can I compare several cutoffs at once?

Yes. Enter comma-separated cutoffs and matching relevant-found counts. The chart and table will show how recall changes as the ranked depth increases.

8. When is low recall especially dangerous?

Low recall is dangerous when missing a relevant document breaks the downstream task, such as legal search, medical retrieval, policy lookup, or enterprise RAG.

Why This Metric Matters in AI & Machine Learning

Retriever recall is a core offline evaluation signal for semantic search, vector databases, hybrid search, and retrieval-augmented generation. If your retriever misses relevant documents, the reranker and language model cannot recover that missing evidence later. Improving recall often lifts answer grounding, reduces hallucinations, and makes evaluation more stable.