Data Drift Detector Calculator

Upload two datasets and compare every feature. See PSI, p values, missingness, and severity badges. Download reports for teams to review today.

Calculator

Upload baseline and current datasets with shared headers. The detector scans every shared column and reports drift metrics.

Tip
Use similar sampling windows for fair comparisons.
Reference window used during training or validation.
Recent production window you want to compare.
Quantile bins built from baseline values.
Extra categories are grouped as “(other)”.
Used for KS (numeric) and χ² (categorical).
Prevents division by zero in PSI terms.
Limits memory usage for large datasets.
Controls auto type detection per column.

Example data table

age income device region clicked
3462000mobilenorth1
4181000desktopsouth0
2854000mobileeast1
47000tabletwest0

Your baseline and current files should share the same header row. Extra columns are ignored; shared columns are analyzed.

What this detects

  • Feature drift: changes in input distributions.
  • Missingness drift: missing values changing over time.
  • Severity badges: PSI thresholds for quick triage.
  • Tests: KS for numeric, χ² for categorical.

Formula used

Population Stability Index (PSI) compares baseline and current proportions across bins or categories.

PSI = Σ (A_i − E_i) × ln(A_i / E_i)

E_i = baseline share in bin/category i
A_i = current share in bin/category i
epsilon is added when shares are zero

For numeric features, bins are built from baseline quantiles. For categorical features, rare values are grouped into “(other)” to stabilize comparisons.

How to use this calculator

  1. Export a baseline CSV from the training or validation window.
  2. Export a current CSV from a recent production window.
  3. Upload both files and tune bins or top categories if needed.
  4. Click Detect drift to compute PSI and test p-values.
  5. Download CSV or PDF reports for monitoring and audits.

Why drift monitoring matters in production

Modern models degrade when incoming data changes. A detector like this compares a stable baseline window against a recent window so teams can spot shifts early. PSI summarizes distribution movement, while hypothesis tests add a statistical signal when sample sizes are adequate. Monitoring both helps separate noise from real change.

What PSI values typically indicate

PSI near 0.00 means the baseline and current distributions are similar. Many teams treat PSI < 0.10 as low drift, 0.10–0.25 as moderate, and ≥ 0.25 as high drift. These thresholds are practical defaults for prioritization, not universal truths. Always interpret PSI with business impact.

Numeric features and quantile binning

For numeric columns, the calculator builds bins using baseline quantiles. This keeps bins balanced and stable even with skewed variables like income or latency. The current data is mapped into the same bin edges, producing comparable proportions. If many repeated values exist, duplicate quantile edges may reduce the effective bin count.

Categorical features and rare-value handling

Categorical columns can have long tails. To prevent one-off values from dominating the comparison, the detector keeps the top categories and groups the rest into “(other)”. This stabilizes PSI and makes reports readable. If your domain has critical rare categories, increase the “Top categories kept” setting.

Missingness as a first-class signal

A feature can “drift” simply because pipelines change. Missing share is reported for baseline and current windows, and the Plotly chart highlights shifts. Large missingness jumps often indicate upstream extraction issues, schema changes, or new traffic sources. Resolve missingness drift before retraining, or the model may learn artifacts.

Operational guidance for alerts and reviews

Use small, consistent time windows, such as daily or weekly snapshots, and ensure sampling is comparable across segments. Start with PSI thresholds to route review work, then look at p-values for supporting evidence. Treat high-drift columns as candidates for feature engineering, calibration, or model refresh. Export CSV/PDF reports for audit trails.

FAQs

1) What is “baseline” versus “current” data?

Baseline is the reference window used for training or validation. Current is a recent production window you want to compare for distribution changes.

2) How many rows do I need for reliable drift signals?

More is better. As a rule of thumb, aim for at least a few hundred rows per window for stable proportions and more meaningful p-values.

3) Why does the tool group categories into “(other)”?

Grouping reduces noise from rare labels and makes PSI more stable. Increase the “Top categories kept” value if rare classes matter.

4) Should I retrain whenever PSI is high?

Not automatically. First check whether drift affects key performance metrics or represents a pipeline issue. Retrain when drift reflects real population change.

5) What do the p-values mean here?

They come from KS tests for numeric columns and chi-square tests for categorical columns. Low p-values suggest the distributions differ under the test assumptions.

6) Can I use this for concept drift?

This tool targets data drift in inputs. Concept drift is a change in the relationship between inputs and outcomes, typically measured with labels and model performance.

Related Calculators

data quality scorewhitespace cleanerdata sanitization tooldata profiling toolunique value counteranomaly detection scoremissing value imputerformat standardizer

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.