Text Summarization Accuracy Calculator Using TensorFlow

Calculator

Calculated Result

Metric	Value	Meaning

Interpretation

Source Text If left empty, coverage uses the reference summary.

Reference Summary

Candidate Summary

N-Gram Size

F-Beta Weight

Semantic Weight in Final Score

Use Case Sensitive Analysis

Remove Punctuation

Remove Stopwords

Formula Used

This tool combines overlap scoring and tensor based similarity.

Precision = Overlap Units ÷ Candidate Units
Recall = Overlap Units ÷ Reference Units
F-Beta = ((1 + β²) × Precision × Recall) ÷ ((β² × Precision) + Recall)
Jaccard Similarity = Intersection of Unique Tokens ÷ Union of Unique Tokens
Cosine Similarity = (A · B) ÷ (||A|| × ||B||)
Coverage Ratio = Candidate Tokens Found in Source ÷ Unique Candidate Tokens
Compression Ratio = Candidate Token Count ÷ Source Token Count
Final Accuracy Score = Weighted blend of n-gram F-Beta, token F1, Jaccard, tensor cosine, and coverage

The tensor cosine score is computed with TensorFlow.js using frequency vectors built from reference and candidate vocabulary.

How to Use This Calculator

Paste the original source text if you want compression and coverage metrics.
Paste the trusted human summary in the reference field.
Paste the model output in the candidate field.
Choose n-gram size and the F-Beta weight.
Pick text cleaning options that match your evaluation policy.
Press Calculate Accuracy to show the result above the form.
Review the metrics table and the interpretation block.
Export the report as CSV or PDF for documentation.

Example Data Table

Case	Reference Summary	Candidate Summary	Expected Reading
Case A	The article says the city cut traffic by adding bus lanes.	The city reduced traffic after new bus lanes were installed.	High overlap and high semantic similarity.
Case B	The report shows sales rose after a new pricing plan.	Sales stayed flat during the old discount period.	Low overlap and weak final score.
Case C	Students improved because practice tests were repeated weekly.	Weekly practice tests helped student performance improve.	Moderate to high score with strong cosine value.

About This Calculator

Why this tool matters

Text summarization needs careful measurement. A short answer can sound good and still miss key facts. This calculator helps you test summary quality with structured math. It compares a generated summary against a trusted reference. It also checks how much of the candidate is grounded in the source. That makes the output more useful for audits, model reviews, and classroom demonstrations.

What the calculator measures

The tool does not rely on one score alone. It computes token precision, recall, and F-Beta. These values show how much content matches and how much is missing. It also builds n-grams and checks phrase level overlap. This is useful because summaries are judged by meaning and wording. Jaccard similarity adds a clean set based comparison. Coverage ratio shows whether the candidate summary stays connected to the source text. Compression ratio shows how strongly the text was shortened.

How TensorFlow is used

The page uses TensorFlow.js for vector based similarity. Reference and candidate terms are converted into frequency vectors. Those vectors become tensors. Cosine similarity is then computed from tensor operations. This adds a mathematical signal beyond direct overlap. Two summaries can use different wording and still remain close in vector space. That is why the semantic score is useful. It complements overlap metrics instead of replacing them.

How to read the final score

The final accuracy score is a weighted blend. N-gram F-Beta gets strong importance. Token F1 also matters. Jaccard, tensor cosine, and coverage add balance. A higher score means the candidate is closer to the reference and remains better anchored to the source. You should still read the summary manually. Numbers help decision making, but human review remains important. This is especially true for factual risk, tone, and missing context.

FAQs

1. What does this calculator evaluate?

It evaluates how close a generated summary is to a trusted summary. It also measures overlap with the source text, semantic similarity, and compression behavior.

2. Why use F-Beta instead of only F1?

F-Beta lets you shift importance between precision and recall. Use beta above one when missing important content is more costly than adding extra words.

3. What does n-gram overlap tell me?

It checks phrase level agreement. Unigrams test word overlap. Bigrams and trigrams test whether important local phrasing and structure were preserved.

4. Why include TensorFlow based cosine similarity?

Cosine similarity adds a vector view. It can show closeness even when two summaries use different but related wording.

5. Should I remove stopwords?

That depends on your policy. Removing stopwords can reduce noise. Keeping them can preserve strict comparison for formal benchmark testing.

6. What is a good final score?

Scores above 85% usually indicate strong alignment. Scores between 65% and 85% are often usable. Lower scores need closer review.

7. Can this replace manual review?

No. It is a decision aid. Human review is still needed for factual correctness, nuance, tone, and hidden omissions.

8. Why does source text remain optional?

Some evaluations compare only a candidate summary with a reference summary. Source text becomes useful when you also want grounding, coverage, and compression metrics.