Outlier Detection Regression Calculator

Calculator

Enter your (x, y) pairs, set thresholds, then compute diagnostics.

Data input

Choose a quick paste or a table.

Paste Table

Pairs (x,y) per line

Delimiter

Auto supports comma, semicolon, or spaces.

Outlier settings

Pick criteria and thresholds for flagging.

Standardized residuals

|r| ≥

Cook's distance

D ≥

Leave blank to use 4/n.

High leverage

Multiplier

Threshold = multiplier × (2/n).

Residual IQR band

Band: [Q1 − k·IQR, Q3 + k·IQR].

Combine criteria

Run options

Compute diagnostics, then optionally refit.

Refit after removing flagged points

Reset

Tip: enable standardized residuals and Cook’s distance for a strong first pass.

Downloads appear after the first calculation.

Example data table

This sample includes one high-influence point.

#	X	Y	Note
1	1	3
2	2	5
3	3	7
4	4	9
5	5	11
6	6	13
7	7	15
8	8	40	Intentional outlier
9	9	19
10	10	21

Click “Use example” to load this dataset into the calculator.

Formula used

b = Σ(x−x̄)(y−ȳ) / Σ(x−x̄)² (slope)
a = ȳ − b·x̄ (intercept)
ŷᵢ = a + b·xᵢ (prediction)
eᵢ = yᵢ − ŷᵢ (residual)
hᵢᵢ = 1/n + (xᵢ−x̄)² / Σ(x−x̄)² (leverage)
s = √(SSE/(n−2)), SSE = Σ eᵢ² (error scale)
rᵢ = eᵢ / (s·√(1−hᵢᵢ)) (standardized residual)
Dᵢ = (eᵢ²/(2·MSE)) · (hᵢᵢ/(1−hᵢᵢ)²) (Cook’s distance)
R² = 1 − SSE/SST, SST = Σ(y−ȳ)²

How to use this calculator

Enter pairs using Paste or Table input.
Select criteria: residuals, Cook’s D, leverage, or IQR band.
Adjust thresholds to match your risk tolerance.
Press compute to view coefficients and diagnostics.
Optionally refit after removing flagged points.
Download the report as CSV or PDF when needed.

Interpretation tip: a point can be influential without being a data error.

Practical goal of outlier screening

Outlier detection in regression is used to protect slope and intercept estimates from a small number of unusual observations. In applied datasets, an extreme y value may be a measurement error, while an extreme x value may be a rare but valid condition. The calculator separates these cases by reporting residual size, leverage, and influence so you can decide what to keep. It also supports “any” versus “all” logic when multiple rules are enabled.

Model coefficients and baseline fit

The fitted line uses ŷ = a + b·x, where b is the covariance of x and y divided by the variance of x. R² summarizes explained variation, while RMSE summarizes typical prediction error in original units. When R² is high but RMSE is still large, the trend exists but noise is sizable, so outlier flags should be reviewed carefully. Internally, SSE and MSE quantify error energy and error scale.

Standardized residual thresholds

A raw residual depends on the scale of y, so standardized residuals rescale it using the regression error estimate and the point’s leverage. Values beyond ±2 indicate unusual deviation, and beyond ±3 often trigger investigation in quality checks. The calculator lets you tune this threshold to match strict auditing or exploratory analysis. If you prefer distribution-free screening, enable the residual IQR band with k = 1.5 as a common default.

Leverage and influence statistics

Leverage hᵢᵢ increases when x is far from x̄, and high leverage points can pull the line even with small residuals. Cook’s distance combines residual size and leverage to approximate the change in fitted coefficients if a point is removed. Common starting rules are leverage greater than 2p/n and Cook’s D greater than 4/n, with p = 2 here. For small samples, consider stricter thresholds because single points carry more weight.

Refit strategy and reporting

Refitting after removing flagged points helps compare sensitivity rather than automatically deleting data. If coefficients shift materially, report both fits and explain the decision using the recorded reasons. For repeatable workflows, export CSV for spreadsheet review and PDF for sign‑off, keeping thresholds and flagged indices with the dataset. This preserves transparency for later internal review.

FAQs

1) What data format does the calculator accept?

Provide at least two (x, y) pairs. You can paste one pair per line using commas, semicolons, or spaces, or enter values in the table rows.

2) What does Cook’s distance tell me?

Cook’s D estimates how much the fitted line would change if one point were removed. Large values suggest an observation is influential, especially when leverage is high.

3) When is the residual IQR band useful?

Use it when you want a simple, distribution‑free rule on residuals. It flags points outside [Q1 − k·IQR, Q3 + k·IQR], often with k = 1.5.

4) Why can a point have high leverage but small residual?

Leverage reflects how extreme x is, not how far y is from the fitted line. A far‑out x can still sit on the trend and produce a small residual.

5) Does refitting automatically improve the model?

Not always. Refit mainly tests sensitivity. If coefficients or RMSE change a lot, investigate the flagged points and document whether they are errors or valid extremes.

6) How should I set thresholds for my use case?

Start with |std residual| ≥ 3, Cook’s D ≥ 4/n, and leverage ≥ 2p/n. Tighten thresholds for audits, loosen for exploration, and consider using “all rules” for fewer flags.