Analyze forecast probabilities and outcomes with scoring tools. Compare skill, bins, and averages across predictions. Build clearer forecast judgments using error insights every day.
Enter probabilities as decimals or percentages. Outcomes must be 0 or 1.
This sample dataset matches the default example loaded into the form.
| Case | Predicted Probability | Observed Outcome | Weight |
|---|---|---|---|
| 1 | 0.05 | 0 | 1 |
| 2 | 0.15 | 0 | 1 |
| 3 | 0.25 | 1 | 1 |
| 4 | 0.40 | 0 | 1 |
| 5 | 0.55 | 1 | 1 |
| 6 | 0.65 | 1 | 1 |
| 7 | 0.72 | 0 | 1 |
| 8 | 0.81 | 1 | 1 |
| 9 | 0.90 | 1 | 1 |
| 10 | 0.96 | 1 | 1 |
Primary Brier Score Formula
BS = (1 / N) × Σ (pᵢ - oᵢ)²
Here, pᵢ is the predicted probability and oᵢ is the observed binary outcome.
Weighted Version
BS = Σ[wᵢ × (pᵢ - oᵢ)²] / Σwᵢ
Weights help emphasize more important observations or larger segments.
Brier Skill Score
BSS = 1 - (BS / BSref)
The reference score usually comes from climatology or another benchmark probability.
Murphy Decomposition
BS ≈ Reliability - Resolution + Uncertainty
This splits total error into calibration quality, discrimination power, and event uncertainty.
The Brier score measures the average squared difference between predicted probabilities and actual binary outcomes. Lower values indicate better forecasting accuracy, because smaller gaps mean predictions better matched what really happened.
A good score depends on the problem and event frequency. Zero is perfect. Scores closer to zero are better, while higher values show larger prediction error. Comparing against a benchmark or historical model is usually more informative than using a single cutoff.
Brier skill score shows whether your forecast beats a reference model. Positive values mean improvement over the benchmark. A value near zero means similar performance, while negative values indicate your forecast performed worse than the reference.
Yes. The calculator accepts either decimals like 0.72 or percentages like 72. When values are greater than 1 and no more than 100, the tool converts them automatically into decimal probabilities.
The standard Brier score is designed for binary events. An outcome of 1 means the event occurred, and 0 means it did not. This keeps the interpretation clear and consistent for probabilistic event forecasting.
Reliability measures calibration quality, showing whether stated probabilities align with observed frequencies. Resolution measures how well forecasts separate different outcome environments. Better forecasts usually have low reliability error and strong resolution.
Use weights when some forecasts represent more cases, bigger impacts, or stronger importance. Weighting lets the score reflect business value, sample size, or operational priority instead of treating every row as equally influential.
The graph compares forecast probabilities, actual outcomes, a benchmark reference line, and calibration points. This helps you see sharpness, consistency, and areas where your probabilities may be overconfident or underconfident.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.