Reveal missingness links before they bias your analysis. Estimate overlap, completeness, and indicator correlation instantly. Make preprocessing choices with clearer evidence across every variable.
Enter two aligned numeric series. Use commas or new lines. Leave blanks, or use listed missing tokens, wherever values are unavailable. Avoid thousands separators inside numbers.
This sample table mirrors the default demo values already loaded in the form.
| Observation | Feature_A | Feature_B | Status |
|---|---|---|---|
| Obs 1 | 12 | 5 | Complete |
| Obs 2 | 15 | 7 | Complete |
| Obs 3 | NA | 8 | Feature_A missing |
| Obs 4 | 18 | NA | Feature_B missing |
| Obs 5 | 20 | 11 | Complete |
| Obs 6 | 22 | 12 | Complete |
| Obs 7 | — | 13 | Feature_A missing |
| Obs 8 | 25 | — | Feature_B missing |
| Obs 9 | 27 | 16 | Complete |
| Obs 10 | 29 | 18 | Complete |
| Obs 11 | 31 | 19 | Complete |
| Obs 12 | 34 | 21 | Complete |
For each observation, define an indicator: M = 1 if a value is missing, otherwise M = 0.
The calculator measures missingness correlation using the phi coefficient from the 2×2 table of observed and missing states.
phi = (a*d - b*c) / sqrt((a+b)(c+d)(a+c)(b+d))
Only rows where both variables are present are used for the value correlation.
Pearson r = sum[(xi - x̄)(yi - ȳ)] / sqrt(sum[(xi - x̄)^2] * sum[(yi - ȳ)^2])
Spearman rho = Pearson correlation of ranked complete-case values
Usable pair retention rate = complete pairs / total observations
Missingness overlap Jaccard = both missing / (x missing + y missing - both missing)
It measures whether the absence of one variable tends to occur with the absence of another. This helps reveal structured missingness that may distort complete-case analysis, imputation, or downstream modeling decisions.
Phi is appropriate for two binary indicators, such as missing versus observed. It quantifies whether missingness patterns move together, move apart, or show little direct relationship across aligned observations.
Choose Spearman when your complete cases are ordinal, nonlinear but monotonic, or strongly affected by outliers. Choose Pearson when you want linear association on roughly continuous, well-behaved complete-case values.
The complete-case correlation becomes unavailable when you have too few usable pairs or when one complete-case variable has zero variation. Both situations make the denominator collapse to zero.
A negative phi means the two missingness indicators move in opposite directions. In practice, one variable is more likely to be observed when the other is missing, rather than disappearing together.
No. A strong complete-case correlation can still be misleading if retention is low or missingness is highly structured. Always read the missingness metrics and sample retention alongside the value correlation.
Yes. It helps identify whether two variables share missingness structure before you choose imputation methods, feature filtering rules, or pairwise versus complete-case strategies in preprocessing workflows.
No. It is a focused diagnostic for two aligned variables. Broader workflows may still need missingness maps, mechanism testing, multiple imputation checks, and model-specific sensitivity analysis.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.