Model Setup
Use a simple logistic model to test input sensitivity.
Example Data Table
A compact sample of inputs you can paste into your own workflow.
| Feature | Baseline | Min | Max | Weight |
|---|---|---|---|---|
| Latency (ms) | 120 | 80 | 250 | -0.012 |
| Error rate | 0.03 | 0.00 | 0.10 | -8.000 |
| Data freshness (days) | 4 | 0 | 14 | -0.180 |
| Feature quality score | 0.78 | 0.50 | 0.95 | 4.200 |
| Training set size (k) | 80 | 20 | 200 | 0.010 |
Formula Used
This tool uses a common probability mapping used in classification scoring.
How to Use
- Enter an intercept if your model has a bias term.
- Add features with baseline, min, max, steps, and weight.
- Run the analysis to get a ranked influence table.
- Download CSV for spreadsheets or reporting pipelines.
- Download PDF for sharing results in reviews.
Why Sensitivity Matters
In applied machine learning, small upstream shifts can create large downstream errors. Sensitivity analysis quantifies which inputs move a prediction the most, helping you focus monitoring and data quality budgets. In many deployments, 20% of features explain 80% of output volatility. Teams often discover that a single operational metric, like latency or error rate, dominates outcome drift during peak traffic.
Choosing Realistic Ranges
Ranges should reflect observed production variation, not theoretical extremes. For example, if a feature quality score usually stays between 0.60 and 0.90, sweeping 0.00 to 1.00 exaggerates risk. Use percentiles from logs, A/B experiments, or recent batch statistics to set min and max. A practical rule is to start with P5 to P95, then tighten to SLA bands for governance reporting.
Reading Local Gradients
Local sensitivity uses the derivative ∂p/∂x at the baseline. For the logistic mapping, ∂p/∂x = w·p·(1−p). This peaks near p=0.5 and shrinks near 0 or 1, so the same weight can behave differently across segments. Compare local gradients to understand near-term impact from small perturbations, such as a 1% rise in error rate or a 10 ms latency bump.
Interpreting Range Impact
Range impact measures p(max)−p(min) while holding other features fixed. It is intuitive for stakeholders because it converts input uncertainty into probability movement. Ranking by absolute range delta highlights high-leverage variables for guardrails, feature clipping, and fallback logic. If a single feature can move p by 0.15 across its range, it deserves stricter validation than a feature that moves p by 0.01.
Using Results in Model Governance
Document the top drivers, the chosen ranges, and the expected probability swing. Pair the results with drift thresholds: if a high-impact input shifts by more than its historical band, trigger investigation. Combine sensitivity ranking with feature importance from training to separate causal-like operational levers from spurious correlations. When models affect approvals or pricing, include sensitivity outputs in change reviews and audits.
Operational Checklist
Update baselines monthly, refresh ranges after releases, and re-run sensitivity for segments. Track the top three drivers in dashboards, add alerts for missingness and outliers, and validate that mitigations reduce the ranked deltas without hurting accuracy. Store CSV exports as evidence and compare runs to quantify stability over time.
FAQs
1) What does “local dY/dX” tell me?
It estimates how much the probability changes for a tiny change in one input at the baseline, with all other inputs held constant. It’s most useful for near-term perturbations and stability checks.
2) Why do I need both local sensitivity and range impact?
Local sensitivity captures small, immediate changes around today’s baseline. Range impact captures worst-to-best movement within your chosen bounds. Together they balance operational realism and stress testing.
3) How should I choose the number of steps?
Use 25–50 steps for smooth curves and stable extrema detection. Increase steps when your range is wide or your stakeholders want finer granularity. Very high steps add time without much extra insight.
4) What if my model is not logistic?
You can still use the range sweep concept by replacing the probability function with your model’s scoring rule. The local derivative formula will differ, but the ranking by range delta remains informative.
5) Can this replace feature importance from training?
No. Feature importance reflects how a model learned from data. Sensitivity reflects how predictions respond to controlled input changes. Use both to separate training signals from operational levers.
6) How do I use outputs for monitoring?
Track the top-ranked inputs, validate their distributions, and alert on shifts beyond historical bands. When drift occurs, rerun the tool with updated baselines to confirm whether the risk profile changed.