Measure model uncertainty across prediction tasks. View losses, compare outputs, and interpret each sample with clear exports and graphs.
| Sample | Task Type | Observed Value | Model Output | Interpretation |
|---|---|---|---|---|
| 1 | Binary | 1 | 0.92 | High confidence and correct prediction. |
| 2 | Binary | 0 | 0.15 | Low risk estimate, matching the true class. |
| 3 | Multiclass | 2 | 0.10, 0.20, 0.70 | Class 2 receives the highest probability. |
| 4 | Gaussian | 5.0 | 4.8 | Small residual means lower Gaussian NLL. |
For binary targets y ∈ {0,1} and predicted probability p for class 1:
NLL = -Σ [ y log(p) + (1-y) log(1-p) ]
This punishes confident mistakes sharply. A prediction of 0.99 for the wrong class creates a much larger loss than a cautious 0.60 error.
For a sample with true class probability p_true:
NLL = -Σ log(p_true)
This is the standard categorical cross-entropy form. Lower loss means the model places stronger probability mass on the correct class.
For observed value y, predicted mean μ, and standard deviation σ:
NLL = 0.5 log(2πσ²) + (y-μ)² / (2σ²)
When the constant term is disabled, the calculator keeps only the squared-error likelihood component. This is useful for comparative model scoring.
It measures how well predicted probabilities explain observed outcomes. Lower values mean the model assigned higher probability to what actually happened.
Lower NLL means the true outcomes were less surprising under the model. Strong predictions on correct events reduce the loss significantly.
Total NLL sums loss across all samples. Mean NLL divides by sample count, making comparison easier across datasets of different sizes.
Clamping prevents taking the logarithm of zero. Without it, extreme probabilities can cause undefined values or unstable numeric results.
Yes. The multiclass mode is designed for softmax-style probabilities. Each row should represent one sample with probabilities across all classes.
Perplexity is the exponential of mean NLL. It is often used in language modeling to describe average uncertainty per sample or token.
The constant term gives the full Gaussian likelihood expression. Removing it is useful when comparing models under the same fixed variance.
Yes. Very large NLL values often appear when a model assigns extreme confidence to wrong predictions, revealing calibration or fit problems.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.