Survey Data Imputation Calculator

Calculator inputs

Estimate missing survey values

Enter observed summary statistics, weighting assumptions, donor information, and a preferred imputation method. The form uses a three-column layout on large screens, two columns on medium screens, and one column on mobile devices.

Total records Full sampled cases before imputation.

Observed records Cases with non-missing target responses.

Observed total Optional override. Blank uses observed mean × observed records.

Observed mean Average among observed cases.

Observed median Middle observed response.

Observed standard deviation Dispersion among observed responses.

Observed minimum Lower valid response boundary.

Observed maximum Upper valid response boundary.

Hot-deck donor mean Average response in a matched donor pool.

Regression intercept Linear model constant term.

Regression slope Model coefficient for auxiliary predictor mean.

Auxiliary mean for missing cases Predictor average among missing records.

Sensitivity z-score Scenario value = mean + z × standard deviation.

Average survey weight Applies a simple weighted total adjustment.

Design effect Used to derive effective sample size.

Selected method Displayed first in the result summary.

Example data table

Illustrative incomplete survey sample

This example shows how a satisfaction score dataset may look before imputation. Missing responses are represented by blank values.

Record ID	Region	Age Band	Weight	Auxiliary Index	Satisfaction Score	Status
R-101	North	25-34	1.18	4.00	4.6	Observed
R-102	North	35-44	1.12	3.90	3.8	Observed
R-103	West	18-24	1.25	4.20		Missing
R-104	South	45-54	1.06	3.70	4.1	Observed
R-105	East	35-44	1.31	4.10		Missing
R-106	West	55-64	1.22	3.80	4.4	Observed

Formula used

Core equations behind the calculator

Response and missingness rates Missing count m = N − n Response rate = n / N Missing rate = m / N

Imputed value by method Mean method: c = ȳ_obs Median method: c = median_obs Hot-deck average: c = donor mean Regression method: c = α + β × x̄_missing Sensitivity scenario: c = clamp(ȳ_obs + z × s_obs, min, max)

Completed mean and total Observed total T_obs = n × ȳ_obs, unless a manual total is supplied Completed total T_complete = T_obs + m × c Completed mean ȳ_complete = T_complete / N

Completed variance after constant imputation s²_complete = [ (n − 1)s²_obs + n(ȳ_obs − ȳ_complete)² + m(c − ȳ_complete)² ] / (N − 1) s_complete = √s²_complete

Weighted total and interval estimate Weighted total = T_complete × average weight Effective sample size n_eff = N / design effect SE = s_complete / √n_eff 95% CI = ȳ_complete ± 1.96 × SE

The calculator treats each method as a constant-value fill for summary planning. In production studies, multiple imputation or donor-level hot-deck methods usually preserve more uncertainty than a single fill value.

How to use

Practical workflow

Enter the total number of sampled records and the number with observed responses.
Supply observed summary statistics for the target survey variable.
Provide donor information, regression coefficients, and a scenario z-score when available.
Set the average survey weight and design effect to reflect your sample design.
Select the method you want highlighted in the result area.
Press Calculate Imputation to place the completed result above the form.
Review the comparison table to document how different methods change totals, means, and variance.
Export the summary using the CSV or PDF buttons for reporting or audit notes.

Frequently asked questions

Survey data imputation FAQs

1. What does this calculator estimate?

It estimates completed survey means, totals, weighted totals, standard deviation, confidence intervals, and variance retention after replacing missing values with a chosen imputation rule.

2. When should mean imputation be used?

Mean imputation works best for quick benchmarking when missingness is light and the survey variable is roughly symmetric. It is simple, but it often understates natural variability.

3. Why is median imputation helpful?

Median imputation is useful when responses are skewed or contain extreme values. It reduces sensitivity to outliers, though it may still flatten the completed distribution.

4. How is hot-deck represented here?

This page uses the donor pool average as a planning shortcut. A full hot-deck workflow usually assigns matched donor records one by one rather than using one average.

5. What does the regression option require?

It requires an intercept, slope, and the auxiliary predictor mean among missing records. The calculator then applies a linear prediction to estimate the missing-value fill.

6. Why does variance retention matter?

Variance retention shows how much dispersion remains after filling missing cases. Lower percentages warn that the imputation choice may make the completed dataset look too stable.

7. How are weights and design effect used?

The average weight scales completed totals, while the design effect reduces the effective sample size used for the standard error and confidence interval estimate.

8. Is this a replacement for multiple imputation?

No. This calculator is best for planning, documentation, and scenario review. Formal inference for published research usually needs richer methods that propagate imputation uncertainty.