Context Recall Calculator

Input Your Retrieval Outcomes

Add rows for multiple queries, then calculate batch scores.

Query / Intent Label

Total Relevant Contexts (R)

Ground-truth relevant items for this query.

Retrieved (k)

Top-k contexts returned.

Relevant Retrieved (r)

How many retrieved are truly relevant.

Notes (optional)

We compute per-row Recall, Precision, and F1, then batch Micro/Macro recall.

Query / Intent Label

Total Relevant Contexts (R)

Ground-truth relevant items for this query.

Retrieved (k)

Top-k contexts returned.

Relevant Retrieved (r)

How many retrieved are truly relevant.

Notes (optional)

We compute per-row Recall, Precision, and F1, then batch Micro/Macro recall.

Query / Intent Label

Total Relevant Contexts (R)

Ground-truth relevant items for this query.

Retrieved (k)

Top-k contexts returned.

Relevant Retrieved (r)

How many retrieved are truly relevant.

Notes (optional)

We compute per-row Recall, Precision, and F1, then batch Micro/Macro recall.

Example Data Table

Use this sample format for a small RAG evaluation batch.

Query	R	k	r	Comment
Billing address change	6	10	5	One relevant chunk missing.
Refund timeframe	4	8	3	Irrelevant policy outranked relevant.
Rate limit headers	3	5	3	Good coverage for top-k.

Formula Used

Per-query Recall: Recall = r / R, where R is total relevant contexts, and r is relevant among retrieved.
Micro Recall (dataset-level): Σr / ΣR. Emphasizes frequent intents with larger R.
Macro Recall (balanced): average of per-query recall values where R > 0.
Optional helpers: Precision r/k and F1 to show tradeoffs when raising k.

How to Use This Calculator

Decide your retrieval cutoff k per query (top-k contexts).
For each query, count R ground-truth relevant contexts.
Run retrieval, label how many returned contexts are relevant (r).
Click Calculate Context Recall to see micro and macro recall.
Export CSV for tracking, or PDF for sharing with stakeholders.

FAQs

1) What does context recall measure in retrieval-augmented systems?

It estimates how much of the needed ground-truth context your retriever returns within top-k. Higher recall usually improves answer grounding when the generator uses retrieved content.

2) Should I use micro or macro recall for reporting?

Use micro recall for overall user-weighted performance, and macro recall for fairness across intents. If rare queries matter, macro recall prevents them from being drowned out.

3) What if my ground truth has zero relevant contexts?

If R equals zero, recall is undefined for that row and excluded from macro averaging. Consider revising the evaluation set to include only queries requiring retrieval.

4) Can recall be high while answers are still wrong?

Yes. High recall only means relevant context was retrieved. The generator may ignore it, hallucinate, or misinterpret it. Track answer faithfulness and citation accuracy too.

5) How does changing k affect recall?

Increasing k often improves recall but may reduce precision by adding noise. Use the precision and F1 columns to observe the tradeoff while tuning k and reranking.

6) How should I count “relevant contexts” in practice?

Define a relevance rubric: exact policy clause, supporting paragraph, or canonical chunk. Keep chunking consistent across runs, and label with two reviewers when possible.

7) What are common causes of low context recall?

Poor chunking, weak embeddings, domain mismatch, missing metadata filters, or overly strict reranking. Also check query rewriting and synonym handling for specialized terminology.

Input Your Retrieval Outcomes

Example Data Table

Formula Used

How to Use This Calculator

FAQs

1) What does context recall measure in retrieval-augmented systems?

2) Should I use micro or macro recall for reporting?

3) What if my ground truth has zero relevant contexts?

4) Can recall be high while answers are still wrong?

5) How does changing k affect recall?

6) How should I count “relevant contexts” in practice?

7) What are common causes of low context recall?

Related Calculators