Word Error Rate Calculator

Calculator Input

Reference Transcript

Hypothesis Transcript

Normalization Options

Convert text to lowercase

Remove punctuation marks

Collapse repeated spaces

WER compares tokenized word sequences after normalization. Use the same preprocessing rules during benchmarking for consistent evaluation.

Example Data Table

Case	Reference	Hypothesis	Expected S	Expected D	Expected I	Expected WER
Sample A	the quick brown fox jumps	the quick fox jumps	0	1	0	20.00%
Sample B	speech recognition is improving fast	speech recognition improves fast	1	1	0	50.00%
Sample C	machine learning models need clean data	machine learning model need very clean data	1	0	1	33.33%

Formula Used

Word Error Rate measures how many word-level edits are needed to transform the predicted transcript into the reference transcript.

WER = (S + D + I) / N × 100

S = substitutions
D = deletions
I = insertions
N = total words in the reference transcript

The calculator uses dynamic programming to compute the minimum edit distance and then backtracks through the alignment path to classify each error.

How to Use This Calculator

Paste the correct transcript into the reference transcript box.
Paste the model output into the hypothesis transcript box.
Choose normalization options such as lowercase conversion or punctuation removal.
Click Calculate WER to compute metrics.
Review the summary, chart, and token-level alignment table.
Use the CSV button to export numeric results.
Use the PDF button to save the visible report.

Frequently Asked Questions

1. What does word error rate measure?

WER measures transcript quality by counting substitutions, deletions, and insertions needed to match a reference transcript. Lower values mean better recognition performance.

2. Why can WER exceed 100%?

WER can exceed 100% when insertions are very high. Since insertions are included in the numerator, heavy over-generation can push the error rate above 100%.

3. Should punctuation be removed before scoring?

That depends on your evaluation protocol. Remove punctuation when your benchmark ignores it. Keep punctuation when punctuation accuracy matters for the final application.

4. Is WER suitable for short utterances?

Yes, but short utterances can produce unstable percentages because one mistake has a large effect. Review both WER and raw counts for context.

5. What is the difference between WER and accuracy?

WER focuses on edit operations relative to the reference length. Accuracy usually expresses the proportion of correctly recognized tokens or samples.

6. Does casing affect the result?

Yes. “Hello” and “hello” are different tokens without normalization. Enable lowercase conversion when your evaluation treats them as equivalent.

7. Can I use this for model comparison?

Yes. Run the same transcripts and preprocessing rules across models. Comparable normalization is essential for fair benchmarking and ranking.

8. Why is alignment useful beyond the final score?

Alignment shows exactly where predictions fail. It reveals repeated substitutions, missing words, and extra tokens, helping diagnose language, acoustic, or decoding issues.