Phoneme Error Rate Calculator

Analyze predicted and reference phoneme sequences with confidence. Track edit operations, accuracy, and distance easily. Visualize errors and export polished results for model reviews.

Calculator Input

Reference phoneme sequence

Enter the ground truth sequence, such as AH0 N D.

Predicted phoneme sequence

Enter the model output sequence using the same delimiter style.

Delimiter mode Custom delimiter

Ignore case

Remove brackets and quotes

Example Data Table

Case	Reference Sequence	Predicted Sequence	Expected Observation
1	HH AH0 L OW1	HH AH0 L AO1	One substitution; PER equals 25.00%.
2	K AE1 T	K AE1 T S	One insertion; PER equals 33.33%.
3	B R IH1 JH	B IH1 JH	One deletion; PER equals 25.00%.

Formula Used

Phoneme Error Rate (PER) measures the proportion of edit operations needed to transform the predicted phoneme sequence into the reference sequence.

PER = (S + D + I) / N

S = substitutions
D = deletions
I = insertions
N = total phonemes in the reference sequence

The calculator uses dynamic programming alignment, similar to Levenshtein distance, to identify the minimum edit path and count each operation type.

How to Use This Calculator

Paste the reference phoneme sequence into the first field.
Paste the predicted phoneme sequence into the second field.
Select the correct delimiter mode for your token format.
Enable cleanup options if your phonemes contain quotes or brackets.
Click Calculate PER to see summary metrics, alignment, and the chart.
Use the CSV and PDF buttons to export the current analysis.

Frequently Asked Questions

1. What does PER measure?

PER measures how many phoneme edits are needed to match a prediction to the reference. Lower values indicate stronger speech or pronunciation model performance.

2. Why can PER exceed 100%?

PER can exceed 100% when insertions are very high. Because insertions count as extra errors, the total edits may become larger than the reference length.

3. What is counted as a substitution?

A substitution happens when one reference phoneme aligns with a different predicted phoneme. It indicates the model recognized the position incorrectly rather than skipping or adding one.

4. What is the difference between PER and accuracy?

PER focuses on total edit cost relative to the reference. Accuracy highlights how many reference phonemes were matched correctly after the optimal alignment is found.

5. Should I use spaces or commas?

Use whichever delimiter matches your dataset. The important part is keeping the same tokenization style for both the reference and predicted sequences.

6. Can this handle ARPAbet or IPA tokens?

Yes. The calculator works with any tokenized phoneme set, including ARPAbet, IPA, or custom symbols, as long as each phoneme is separated consistently.

7. Why is alignment useful?

Alignment shows exactly where substitutions, deletions, and insertions occur. This makes debugging easier when you want to inspect pronunciation or recognition failures.

8. When should I compare substitution patterns?

Review substitution patterns when diagnosing systematic confusions, such as vowels being swapped repeatedly. Those patterns often reveal token mapping or acoustic modeling weaknesses.