Phoneme Error Rate Calculator

Analyze predicted and reference phoneme sequences with confidence. Track edit operations, accuracy, and distance easily. Visualize errors and export polished results for model reviews.

Calculator Input

Enter the ground truth sequence, such as AH0 N D.
Enter the model output sequence using the same delimiter style.

Example Data Table

Case Reference Sequence Predicted Sequence Expected Observation
1 HH AH0 L OW1 HH AH0 L AO1 One substitution; PER equals 25.00%.
2 K AE1 T K AE1 T S One insertion; PER equals 33.33%.
3 B R IH1 JH B IH1 JH One deletion; PER equals 25.00%.

Formula Used

Phoneme Error Rate (PER) measures the proportion of edit operations needed to transform the predicted phoneme sequence into the reference sequence.

PER = (S + D + I) / N

The calculator uses dynamic programming alignment, similar to Levenshtein distance, to identify the minimum edit path and count each operation type.

How to Use This Calculator

  1. Paste the reference phoneme sequence into the first field.
  2. Paste the predicted phoneme sequence into the second field.
  3. Select the correct delimiter mode for your token format.
  4. Enable cleanup options if your phonemes contain quotes or brackets.
  5. Click Calculate PER to see summary metrics, alignment, and the chart.
  6. Use the CSV and PDF buttons to export the current analysis.

Frequently Asked Questions

1. What does PER measure?

PER measures how many phoneme edits are needed to match a prediction to the reference. Lower values indicate stronger speech or pronunciation model performance.

2. Why can PER exceed 100%?

PER can exceed 100% when insertions are very high. Because insertions count as extra errors, the total edits may become larger than the reference length.

3. What is counted as a substitution?

A substitution happens when one reference phoneme aligns with a different predicted phoneme. It indicates the model recognized the position incorrectly rather than skipping or adding one.

4. What is the difference between PER and accuracy?

PER focuses on total edit cost relative to the reference. Accuracy highlights how many reference phonemes were matched correctly after the optimal alignment is found.

5. Should I use spaces or commas?

Use whichever delimiter matches your dataset. The important part is keeping the same tokenization style for both the reference and predicted sequences.

6. Can this handle ARPAbet or IPA tokens?

Yes. The calculator works with any tokenized phoneme set, including ARPAbet, IPA, or custom symbols, as long as each phoneme is separated consistently.

7. Why is alignment useful?

Alignment shows exactly where substitutions, deletions, and insertions occur. This makes debugging easier when you want to inspect pronunciation or recognition failures.

8. When should I compare substitution patterns?

Review substitution patterns when diagnosing systematic confusions, such as vowels being swapped repeatedly. Those patterns often reveal token mapping or acoustic modeling weaknesses.

Related Calculators

real time factorspeech recognition accuracycharacter error rateframe length calculatorword error ratevoice activity detection

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.