Estimate missing survey values
Enter observed summary statistics, weighting assumptions, donor information, and a preferred imputation method. The form uses a three-column layout on large screens, two columns on medium screens, and one column on mobile devices.
Illustrative incomplete survey sample
This example shows how a satisfaction score dataset may look before imputation. Missing responses are represented by blank values.
| Record ID | Region | Age Band | Weight | Auxiliary Index | Satisfaction Score | Status |
|---|---|---|---|---|---|---|
| R-101 | North | 25-34 | 1.18 | 4.00 | 4.6 | Observed |
| R-102 | North | 35-44 | 1.12 | 3.90 | 3.8 | Observed |
| R-103 | West | 18-24 | 1.25 | 4.20 | Missing | |
| R-104 | South | 45-54 | 1.06 | 3.70 | 4.1 | Observed |
| R-105 | East | 35-44 | 1.31 | 4.10 | Missing | |
| R-106 | West | 55-64 | 1.22 | 3.80 | 4.4 | Observed |
Core equations behind the calculator
Missing count m = N − n
Response rate = n / N
Missing rate = m / N
Mean method: c = ȳ_obs
Median method: c = median_obs
Hot-deck average: c = donor mean
Regression method: c = α + β × x̄_missing
Sensitivity scenario: c = clamp(ȳ_obs + z × s_obs, min, max)
Observed total T_obs = n × ȳ_obs, unless a manual total is supplied
Completed total T_complete = T_obs + m × c
Completed mean ȳ_complete = T_complete / N
s²_complete = [ (n − 1)s²_obs + n(ȳ_obs − ȳ_complete)² + m(c − ȳ_complete)² ] / (N − 1)
s_complete = √s²_complete
Weighted total = T_complete × average weight
Effective sample size n_eff = N / design effect
SE = s_complete / √n_eff
95% CI = ȳ_complete ± 1.96 × SE
The calculator treats each method as a constant-value fill for summary planning. In production studies, multiple imputation or donor-level hot-deck methods usually preserve more uncertainty than a single fill value.
Practical workflow
- Enter the total number of sampled records and the number with observed responses.
- Supply observed summary statistics for the target survey variable.
- Provide donor information, regression coefficients, and a scenario z-score when available.
- Set the average survey weight and design effect to reflect your sample design.
- Select the method you want highlighted in the result area.
- Press Calculate Imputation to place the completed result above the form.
- Review the comparison table to document how different methods change totals, means, and variance.
- Export the summary using the CSV or PDF buttons for reporting or audit notes.
Survey data imputation FAQs
1. What does this calculator estimate?
It estimates completed survey means, totals, weighted totals, standard deviation, confidence intervals, and variance retention after replacing missing values with a chosen imputation rule.
2. When should mean imputation be used?
Mean imputation works best for quick benchmarking when missingness is light and the survey variable is roughly symmetric. It is simple, but it often understates natural variability.
3. Why is median imputation helpful?
Median imputation is useful when responses are skewed or contain extreme values. It reduces sensitivity to outliers, though it may still flatten the completed distribution.
4. How is hot-deck represented here?
This page uses the donor pool average as a planning shortcut. A full hot-deck workflow usually assigns matched donor records one by one rather than using one average.
5. What does the regression option require?
It requires an intercept, slope, and the auxiliary predictor mean among missing records. The calculator then applies a linear prediction to estimate the missing-value fill.
6. Why does variance retention matter?
Variance retention shows how much dispersion remains after filling missing cases. Lower percentages warn that the imputation choice may make the completed dataset look too stable.
7. How are weights and design effect used?
The average weight scales completed totals, while the design effect reduces the effective sample size used for the standard error and confidence interval estimate.
8. Is this a replacement for multiple imputation?
No. This calculator is best for planning, documentation, and scenario review. Formal inference for published research usually needs richer methods that propagate imputation uncertainty.