Inputs
Example data table
| ID | Engagement | Tenure | Price Sens. | Promo Viewed | Outcome |
|---|---|---|---|---|---|
| U-1001 | 81 | 12 | 0.40 | 1 | 1 |
| U-1002 | 55 | 4 | 0.75 | 0 | 0 |
| U-1003 | 68 | 8 | 0.62 | 1 | 1 |
| U-1004 | 42 | 2 | 0.90 | 0 | 0 |
| U-1005 | 76 | 10 | 0.48 | 1 | 1 |
| U-1006 | 60 | 6 | 0.58 | 0 | 0 |
Outcome is a binary label (1 = event occurred). Replace with your domain outcome.
Formula used
p = 1 / (1 + e-(score / temperature))
odds = prior_odds × Π(ORixi) × e(w·A·B)
p = odds / (1 + odds)
How to use this calculator
- Enter a base probability that matches your historical rate.
- Choose a method: logistic scoring, odds update, or hybrid.
- Add features with values and either coefficients or odds ratios.
- Optionally standardize using the same training mean and std.
- Set a decision threshold, plus costs/benefits for expected value.
- Run simulation if inputs are uncertain, then export results.
Base rate and prior probability
A good outcome chance starts with the base rate: your historical event frequency. If the last 10,000 cases produced 1,800 events, the prior probability is 18%. When you change the prior from 10% to 20%, the prior odds double from 0.111 to 0.250, which can materially shift downstream decisions.
Feature effects on log-odds
In logistic scoring, each feature contributes bi·xi to the log-odds. A coefficient of 0.70 adds about 2× odds when xi increases by 1 because exp(0.70)=2.01. Standardizing inputs using a training mean and standard deviation makes coefficients comparable across scales and reduces instability when features are measured in different units. With odds ratios, update odds by OR^xi without refitting.
Calibration and temperature scaling
Raw model probabilities often drift. Calibration bias shifts log-odds up or down, while temperature scaling smooths confidence. For example, a score of 1.20 yields p=0.77; with temperature 1.50 the same score becomes p=0.69. Track Brier score and reliability curves quarterly to verify that predicted bins, such as 0.60–0.70, match observed rates. Keep discrimination metrics like AUC separate from calibration, since a high AUC can still be miscalibrated.
Decision threshold and expected value
Thresholds should follow economics, not intuition. If a false positive costs 2 units and a false negative costs 8 units, you can justify a lower threshold to avoid missed events. Expected value compares “Act” versus “Hold” using p, benefits for true outcomes, and costs for mistakes, producing a recommendation that adapts as p moves. In a triage workflow, a 30% threshold may maximize value, while a 60% threshold may prioritize precision.
Uncertainty range and monitoring
Inputs and coefficients can be uncertain, so simulation provides a practical range. A 5th–95th percentile spread of 22%–48% signals that collecting better measurements may outperform model tuning. Report the median (P50) alongside the range and store it with the exported result for audit trails, for governance. Monitor feature distributions, calibration bias, and decision outcomes; when drift appears, refresh coefficients, update odds ratios, and re-check your chosen threshold.
FAQs
What does “Outcome Chance” represent?
It is a probability estimate for an event, given your base rate and feature effects. Use it for prioritization and scenario testing, not as a guarantee, and validate it against recent holdout outcomes.
Which method should I choose: logistic, odds, or hybrid?
Use logistic when you have coefficients from a fitted model, odds when you have interpretable odds ratios, and hybrid when you want a weighted blend. Compare calibration and value at your operating threshold.
Why would I enable standardization?
Standardization applies (value−mean)/std so features share a comparable scale. This helps when your coefficients were trained on standardized data and prevents large-magnitude inputs, like revenue, from dominating smaller-scale signals.
What does temperature scaling do?
Temperature divides the score before the sigmoid. Values above 1 soften extreme probabilities; values below 1 sharpen them. Use it to improve calibration after deployment, ideally tuned on a recent validation set.
How do I set the decision threshold?
Pick the threshold that maximizes expected value given your costs and benefits. Higher false-negative cost usually lowers the threshold. Review the choice with stakeholders and re-check it when base rates or costs change.
How should I interpret the simulation range?
The P05–P95 range reflects input and coefficient uncertainty based on the SD fields. A wide range means decisions are sensitive, so collect better data, narrow assumptions, or use a more conservative threshold.