Maximum Likelihood Missing Data Calculator

Calculator Input

Use one stacked page layout. The form itself becomes 3 columns on large screens, 2 on medium, and 1 on mobile.

Dataset

Separate values with commas, new lines, semicolons, or tabs.

Missing Tokens

Examples: NA, ?, null, missing

Value Label

Optional label for your numeric variable.

Missingness Assumption

Confidence Level (%)

Outlier z-Threshold

Convergence Tolerance

Maximum Iterations

Result Decimals

Initial Mean (Optional)

Initial Variance (Optional)

Example Data Table

This example shows how missing entries are recorded before estimation.

Index	Score	Status
1	12.5	Observed
2	14.1	Observed
3	NA	Missing
4	15.4	Observed
5	13.8	Observed
6	?	Missing
7	16.2	Observed
8	14.9	Observed

Formula Used

This calculator fits a univariate normal model with incomplete data. It maximizes the observed-data likelihood and uses EM updates for the missing values.

Observed-data log-likelihood

ℓ(μ, σ² | y_obs) = -(n_obs / 2) ln(2πσ²) - [1 / (2σ²)] Σ(y_i - μ)²

E-step

E[y_mis | θ(t)] = μ(t), E[y_mis² | θ(t)] = σ²(t) + μ(t)²

M-step

μ(t+1) = [Σy_obs + n_mis μ(t)] / n

σ²(t+1) = {Σy_obs² + n_mis[σ²(t) + μ(t)²]} / n - μ(t+1)²

Approximate confidence interval for the mean

CI = μ ± z × (σ / √n_obs)

How to Use This Calculator

Paste a numeric series into the dataset box.
Mark absent entries with tokens such as NA, ?, or null.
Choose the working missingness assumption: MCAR, MAR, or MNAR.
Set tolerance, maximum iterations, confidence level, decimals, and optional starting values.
Submit the form to estimate mean, variance, log-likelihood, and expected missing values.
Review the iteration table, imputed dataset preview, and the Plotly visualization.
Export the results as CSV or PDF for documentation.

Frequently Asked Questions

1. What does this calculator estimate?

It estimates the mean and variance of one numeric variable with missing entries, using an EM-based maximum likelihood routine under a normal model.

2. Does it replace missing values permanently?

No. It reports expected values implied by the fitted model. Those values are useful for review, but separate modeling decisions may still be needed.

3. When is MAR a reasonable choice?

MAR is reasonable when missingness can depend on observed information but not on the unseen value itself after conditioning on observed data.

4. Why does MNAR trigger a warning?

MNAR usually requires a direct model for the missingness process. A simple ignorable-likelihood EM routine is not enough for full MNAR inference.

5. What distribution does this page assume?

This page assumes a univariate normal distribution for the analysis variable. Strong skewness or categorical data should be handled with other models.

6. Why show AIC and BIC?

AIC and BIC summarize model fit with a complexity penalty. They are most useful when you compare competing likelihood-based models on the same data.

7. Is the confidence interval exact?

No. The interval shown here is an approximate normal-based interval around the estimated mean, using the fitted standard deviation and observed sample size.

8. Can I analyze several variables together?

This implementation focuses on one variable at a time. Multivariate FIML, SEM, or specialized missing-data software is better for joint models.