This sample includes typical values plus intentional extremes.
| RespondentID | SatisfactionScore | TimeSpentMinutes |
|---|---|---|
| 1001 | 8 | 12 |
| 1002 | 9 | 10 |
| 1003 | 7 | 14 |
| 1008 | 10 | 8 |
| 1009 | 2 | 15 |
| 1015 | 1 | 200 |
Lower = Q1 − k·IQR
Upper = Q3 + k·IQR
mz = 0.6745·(x − median) / MAD
Upper = Phigh(x)
- Paste your CSV or upload it.
- Enable header if you have column names.
- Select the column by name or number.
- Start with MAD, then add IQR for confirmation.
- Use a log transform for heavy right tails.
- Review flagged rows against survey context.
- Check whether outliers are valid edge cases.
- Adjust thresholds to match your quality bar.
- Download CSV for audits and data pipelines.
- Download PDF for sharing and reporting.
Outliers and survey decision risk
Survey datasets drive product, HR, and service decisions. A few extreme entries can shift means, inflate variance, and distort trends. In a 1–10 satisfaction scale, a single “1” among mostly 8–9 responses can change the average enough to mis-rank teams. Time-on-task surveys often contain accidental zeros or very large values from paused sessions. Detecting these cases early prevents misleading dashboards, improves model training, and reduces rework during stakeholder reviews. A structured outlier review also supports reproducible cleaning across monthly tracking programs for teams.
Typical patterns in respondent behavior
Outliers are not only numeric mistakes. “Speeders” may finish unrealistically fast, while “stallers” may leave a form open for hours. Straight-lining creates unusually low variability across multi-item batteries, and can produce tail values after scoring. Duplicate submissions can cluster at identical totals. The example table in this tool mirrors real audits: one respondent reports 200 minutes when most report 8–15, a strong candidate for investigation.
Method selection for robust screening
This calculator offers complementary detectors. The IQR rule uses quartiles, so it is stable for skewed distributions and works well for bounded scales. Standard Z-scores assume an approximately normal shape; they are useful after transformation or when the central bulk is symmetric. Modified Z-scores based on MAD are resilient to heavy tails and are preferred for operational monitoring. Percentile bounds suit policy-based trimming, such as excluding values below P1 or above P99.
Thresholds, transforms, and governance
No single threshold fits every program. Start with MAD at 3.5, then compare IQR at k=1.5. For noisy engagement metrics, consider k=3 or broader percentiles. Log transforms reduce right-tail leverage for time and spend variables, but require positive inputs. Use the combine rule to match your tolerance: “any” maximizes recall, while “all” prioritizes precision for compliance reports.
Reporting, exports, and next actions
Treat flags as prompts, not automatic deletions. Review the row index, original value, and triggered methods together. If the outlier reflects a true edge case, keep it and document the rationale. If it is an error, correct upstream collection rules and rerun checks. Export CSV for pipeline handoffs and PDF for audit trails, ensuring consistent quality across survey waves.
Which column should I analyze?
Choose the numeric variable used for scoring, such as satisfaction totals, time spent, or composite indices. If you have headers, enter the exact column name; otherwise enter a 1-based column number.
Should I enable the header option?
Enable it when the first row contains column names. It lets you select columns by name and reduces mistakes when files change. Disable it when your data starts directly with responses.
What if my metric is highly skewed?
Try the log or natural log transform for strictly positive metrics like time or spend. Then use MAD or IQR for detection. Transforms can stabilize spread and make Z-scores more meaningful.
How do I pick thresholds?
Start with common defaults: MAD 3.5, Z 3.0, and IQR k=1.5. Tighten thresholds to catch more anomalies, or relax them for exploratory research. Validate with spot checks and domain rules.
Can I include non-numeric rows in results?
Yes. Set Missing / Non-numeric to “Keep row (marked)”. The table will show those rows with notes, but they are not scored. This helps you locate formatting issues without losing row positions.
What is included in CSV and PDF exports?
CSV contains row index, original and transformed values, scores, outlier flag, and triggered methods. PDF summarizes totals, settings, and a short list of flagged rows, suitable for sharing and audit trails.