Paste pairs, run regression, and uncover influential observations. Tune thresholds, refit without outliers, compare diagnostics. Download CSV or PDF summaries for quick sharing today.
| # | X | Y | Note |
|---|---|---|---|
| 1 | 1 | 3 | |
| 2 | 2 | 5 | |
| 3 | 3 | 7 | |
| 4 | 4 | 9 | |
| 5 | 5 | 11 | |
| 6 | 6 | 13 | |
| 7 | 7 | 15 | |
| 8 | 8 | 40 | Intentional outlier |
| 9 | 9 | 19 | |
| 10 | 10 | 21 |
Outlier detection in regression is used to protect slope and intercept estimates from a small number of unusual observations. In applied datasets, an extreme y value may be a measurement error, while an extreme x value may be a rare but valid condition. The calculator separates these cases by reporting residual size, leverage, and influence so you can decide what to keep. It also supports “any” versus “all” logic when multiple rules are enabled.
The fitted line uses ŷ = a + b·x, where b is the covariance of x and y divided by the variance of x. R² summarizes explained variation, while RMSE summarizes typical prediction error in original units. When R² is high but RMSE is still large, the trend exists but noise is sizable, so outlier flags should be reviewed carefully. Internally, SSE and MSE quantify error energy and error scale.
A raw residual depends on the scale of y, so standardized residuals rescale it using the regression error estimate and the point’s leverage. Values beyond ±2 indicate unusual deviation, and beyond ±3 often trigger investigation in quality checks. The calculator lets you tune this threshold to match strict auditing or exploratory analysis. If you prefer distribution-free screening, enable the residual IQR band with k = 1.5 as a common default.
Leverage hᵢᵢ increases when x is far from x̄, and high leverage points can pull the line even with small residuals. Cook’s distance combines residual size and leverage to approximate the change in fitted coefficients if a point is removed. Common starting rules are leverage greater than 2p/n and Cook’s D greater than 4/n, with p = 2 here. For small samples, consider stricter thresholds because single points carry more weight.
Refitting after removing flagged points helps compare sensitivity rather than automatically deleting data. If coefficients shift materially, report both fits and explain the decision using the recorded reasons. For repeatable workflows, export CSV for spreadsheet review and PDF for sign‑off, keeping thresholds and flagged indices with the dataset. This preserves transparency for later internal review.
Provide at least two (x, y) pairs. You can paste one pair per line using commas, semicolons, or spaces, or enter values in the table rows.
Cook’s D estimates how much the fitted line would change if one point were removed. Large values suggest an observation is influential, especially when leverage is high.
Use it when you want a simple, distribution‑free rule on residuals. It flags points outside [Q1 − k·IQR, Q3 + k·IQR], often with k = 1.5.
Leverage reflects how extreme x is, not how far y is from the fitted line. A far‑out x can still sit on the trend and produce a small residual.
Not always. Refit mainly tests sensitivity. If coefficients or RMSE change a lot, investigate the flagged points and document whether they are errors or valid extremes.
Start with |std residual| ≥ 3, Cook’s D ≥ 4/n, and leverage ≥ 2p/n. Tighten thresholds for audits, loosen for exploration, and consider using “all rules” for fewer flags.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.