Upload two datasets and compare every feature. See PSI, p values, missingness, and severity badges. Download reports for teams to review today.
Upload baseline and current datasets with shared headers. The detector scans every shared column and reports drift metrics.
| age | income | device | region | clicked |
|---|---|---|---|---|
| 34 | 62000 | mobile | north | 1 |
| 41 | 81000 | desktop | south | 0 |
| 28 | 54000 | mobile | east | 1 |
| 47000 | tablet | west | 0 |
Your baseline and current files should share the same header row. Extra columns are ignored; shared columns are analyzed.
Population Stability Index (PSI) compares baseline and current proportions across bins or categories.
PSI = Σ (A_i − E_i) × ln(A_i / E_i) E_i = baseline share in bin/category i A_i = current share in bin/category i epsilon is added when shares are zero
For numeric features, bins are built from baseline quantiles. For categorical features, rare values are grouped into “(other)” to stabilize comparisons.
Modern models degrade when incoming data changes. A detector like this compares a stable baseline window against a recent window so teams can spot shifts early. PSI summarizes distribution movement, while hypothesis tests add a statistical signal when sample sizes are adequate. Monitoring both helps separate noise from real change.
PSI near 0.00 means the baseline and current distributions are similar. Many teams treat PSI < 0.10 as low drift, 0.10–0.25 as moderate, and ≥ 0.25 as high drift. These thresholds are practical defaults for prioritization, not universal truths. Always interpret PSI with business impact.
For numeric columns, the calculator builds bins using baseline quantiles. This keeps bins balanced and stable even with skewed variables like income or latency. The current data is mapped into the same bin edges, producing comparable proportions. If many repeated values exist, duplicate quantile edges may reduce the effective bin count.
Categorical columns can have long tails. To prevent one-off values from dominating the comparison, the detector keeps the top categories and groups the rest into “(other)”. This stabilizes PSI and makes reports readable. If your domain has critical rare categories, increase the “Top categories kept” setting.
A feature can “drift” simply because pipelines change. Missing share is reported for baseline and current windows, and the Plotly chart highlights shifts. Large missingness jumps often indicate upstream extraction issues, schema changes, or new traffic sources. Resolve missingness drift before retraining, or the model may learn artifacts.
Use small, consistent time windows, such as daily or weekly snapshots, and ensure sampling is comparable across segments. Start with PSI thresholds to route review work, then look at p-values for supporting evidence. Treat high-drift columns as candidates for feature engineering, calibration, or model refresh. Export CSV/PDF reports for audit trails.
Baseline is the reference window used for training or validation. Current is a recent production window you want to compare for distribution changes.
More is better. As a rule of thumb, aim for at least a few hundred rows per window for stable proportions and more meaningful p-values.
Grouping reduces noise from rare labels and makes PSI more stable. Increase the “Top categories kept” value if rare classes matter.
Not automatically. First check whether drift affects key performance metrics or represents a pipeline issue. Retrain when drift reflects real population change.
They come from KS tests for numeric columns and chi-square tests for categorical columns. Low p-values suggest the distributions differ under the test assumptions.
This tool targets data drift in inputs. Concept drift is a change in the relationship between inputs and outcomes, typically measured with labels and model performance.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.