Analyze token likelihood patterns with perplexity calculation modes. View entropy, cross-entropy, and log loss instantly. Clean inputs, inspect steps, and download records for reporting.
The page stays single-column, while the calculator fields use responsive columns.
This example uses probabilities with optional weights to illustrate weighted sequence perplexity.
| Token | Probability | Weight | -ln(p) |
|---|---|---|---|
| A | 0.60 | 2 | 0.510826 |
| B | 0.30 | 1 | 1.203973 |
| C | 0.10 | 1 | 2.302585 |
Weighted average NLL = 1.132052 nats, so perplexity = e^1.132052 = 3.102351. That means the effective uncertainty is about 3.10 equally likely choices.
Weighted perplexity from probabilities
PP = exp[ - ( Σ wᵢ ln pᵢ ) / ( Σ wᵢ ) ]
When cross-entropy is already in bits, PP = 2^H.
pᵢ is the observed token probability, wᵢ is the optional weight, and H is cross-entropy in bits per token. Lower perplexity means lower uncertainty. A value near 1 suggests highly confident, concentrated predictions.
If you enter natural log probabilities, the calculator first converts them with pᵢ = e^(log pᵢ). If you enter base-2 log probabilities, it converts them with pᵢ = 2^(log₂ pᵢ).
Perplexity measures how uncertain a probability model is about observed outcomes. It acts like an effective number of equally likely choices. Lower values indicate more concentrated and confident predictions.
Usually yes for the same task, dataset, and tokenization scheme. However, comparing values across different datasets, preprocessing pipelines, or token definitions can be misleading because the scale changes.
Yes. Choose natural log or base-2 log mode, then paste the values directly. The calculator converts them internally before computing average loss and final perplexity.
Weights let you emphasize repeated observations, frequency counts, or importance scores without duplicating entries. The calculator uses weighted averages, so larger weights influence the final perplexity more strongly.
No for valid probability-based inputs. The theoretical minimum is 1, reached when every observed outcome has probability 1. Values below 1 usually indicate invalid inputs or incorrect transformations.
Use the cross-entropy mode and enter the value in bits per token. The calculator applies PP = 2^H, which directly converts entropy on the bit scale into perplexity.
Yes. Different tokenization schemes change the number and distribution of prediction events. That means perplexity values from word, character, and subword tokenizers should not be compared casually.
Use it when evaluating language models, sequence predictors, probabilistic classifiers, or uncertainty summaries. It is also helpful for teaching entropy concepts and checking manual calculations quickly.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.