Calculator Inputs
Use this tool for drift checks, score stability, fairness studies, calibration review, and validation comparisons.
Example Data Table
This example shows two model score distributions that can be tested for monitoring drift between reference and production traffic.
| Index | Reference Score | Production Score |
|---|---|---|
| 1 | 0.05 | 0.03 |
| 2 | 0.08 | 0.06 |
| 3 | 0.11 | 0.10 |
| 4 | 0.14 | 0.12 |
| 5 | 0.19 | 0.17 |
| 6 | 0.23 | 0.21 |
| 7 | 0.27 | 0.25 |
| 8 | 0.31 | 0.29 |
| 9 | 0.36 | 0.34 |
| 10 | 0.40 | 0.41 |
| 11 | 0.45 | 0.48 |
| 12 | 0.52 | 0.58 |
Formula Used
The Kolmogorov Smirnov statistic measures the largest vertical difference between two cumulative distribution functions. In two-sample mode, the calculator compares two empirical cumulative distribution functions directly.
| Two-Sample Statistic | D = supx |Fn(x) − Gm(x)| |
|---|---|
| One-Sample Statistic | D = supx |Fn(x) − F(x)| |
| Effective Sample Size | neff = (n × m) / (n + m) for two-sample mode, and n for one-sample mode. |
| Asymptotic Critical Value | Dcrit = c(α) / √neff, where c(α) = √[-0.5 ln(α/2)] for the two-sided test. |
| Decision Rule | Reject the null hypothesis when D > Dcrit. |
In AI and machine learning, a higher D statistic can indicate prediction drift, score instability, fairness imbalance, or a mismatch between expected and observed distributions.
How to Use This Calculator
- Choose Two Sample to compare two observed datasets, or One Sample to compare one dataset against a theoretical distribution.
- Select the alternative hypothesis and significance level that match your testing goal.
- Paste numeric values into the text boxes. Separate values with commas, spaces, or line breaks.
- For one-sample mode, choose a reference distribution and either estimate its parameters from the sample or enter custom values.
- Click Calculate KS Statistic to display the result above the form, including the ECDF plot and decision summary.
- Use the CSV and PDF buttons to export the result summary and pointwise comparison table.
Frequently Asked Questions
1. What does the KS statistic measure?
It measures the largest distance between two cumulative distributions. A bigger value means the distributions differ more strongly at some observed point.
2. When should I use the two-sample version?
Use it when comparing two observed datasets, such as training versus production scores, accepted versus rejected applications, or outputs from two model versions.
3. When should I use the one-sample version?
Use it when checking whether one observed dataset follows a reference distribution such as normal, uniform, or exponential behavior.
4. What does “Reject H₀” mean here?
It means the statistic is larger than the critical threshold, so the compared distributions are significantly different at the chosen significance level.
5. Why is this useful for AI and machine learning?
It helps detect data drift, score drift, sampling mismatch, fairness shifts, and model behavior changes between reference and live environments.
6. Does the calculator require sorted data?
No. The calculator sorts values internally before building the cumulative distributions and computing the maximum distance.
7. Can I compare probability scores from classification models?
Yes. Comparing probability scores is a common use case for calibration checks, score stability review, and production drift monitoring.
8. What are the limits of this implementation?
The p value uses standard asymptotic approximations. For tiny samples or highly specialized testing setups, an exact or library-based method may be preferred.