PCA Scree Plot Calculator

Calculator

CSV Upload

Optional. Upload overrides pasted text.

Delimiter

Match your CSV separator.

Header Row

First row contains column names

If unchecked, variables become V1, V2, …

Missing Values

Missing: blank, NA, NaN, null.

Matrix Type

Correlation is recommended with mixed scales.

Preprocessing

Correlation always applies z-scoring.

Variance Target

Example: 0.90 means 90% cumulative variance.

Max Components (display)

0 shows all components.

Paste CSV Data

Use consistent units per column. PCA expects numeric variables.

Example Data Table

This sample has four correlated variables. Try it using “Load Example”.

A	B	C	D
12	18	25	40
15	20	22	38
14	19	24	41
16	21	23	39
13	17	26	42
18	24	28	45
17	23	27	44
19	25	29	46
20	26	30	47
21	27	31	48

Formula Used

Centering: \(x'_{ij} = x_{ij} - \bar{x}_j\)
Z-score: \(z_{ij} = (x_{ij} - \bar{x}_j) / s_j\)
Covariance matrix: \(C = \frac{1}{n-1}X^\top X\)
Eigenvalues: \(C v_i = \lambda_i v_i\)
Explained variance ratio: \(r_i = \lambda_i / \sum_k \lambda_k\)
Cumulative variance: \(R_i = \sum_{k\le i} r_k\)

The scree plot graphs eigenvalues \(\lambda_i\) by component index. Larger eigenvalues indicate components explaining more variability.

How to Use This Calculator

Paste numeric CSV data or upload a CSV file.
Confirm whether the first row contains column names.
Select covariance for same-scale data, correlation for mixed scales.
Choose missing-value handling and your variance target.
Press Submit to see the scree plot and table above.
Download the results as CSV or PDF for reporting.

Scree Plot Purpose and Output

A scree plot summarizes how much structure each principal component captures. This calculator forms a covariance or correlation matrix from your numeric columns, computes its eigenvalues, then orders them from largest to smallest. The first points typically fall sharply because early components absorb most variance. The explained-variance column reports 100×λ_i/Σλ, while the cumulative column tracks Σ_k≤i 100×λ_k/Σλ. For a correlation matrix, Σλ equals the number of variables.

Variance Targets for Component Selection

Variance targets translate the curve into an actionable cutoff. Many applied studies retain enough components to reach 70%–95% cumulative variance, depending on noise, measurement precision, and interpretability needs. Enter a target such as 0.90 to keep the smallest k where cumulative variance ≥ 90%. If your variables are highly correlated, k may be small; if features are diverse, expect a larger k. Use the table to justify the trade-off.

Elbow and Kaiser Diagnostics

Two diagnostics help confirm the choice. The elbow heuristic here uses second differences in adjacent eigenvalues to locate the sharpest change in slope, flagging a point after which additional components add diminishing returns. When you choose the correlation matrix, the Kaiser rule is also shown: keep components with eigenvalue > 1, meaning they explain more variance than an average standardized variable. Treat Kaiser as a guide, not a strict rule.

Preprocessing and Data Quality Checks

Preprocessing affects eigenvalues, so check scale and missing data carefully. Mean-centering is appropriate when all columns share comparable units. Z-scoring standardizes columns to unit variance and is preferred for mixed units or different magnitudes; it is automatically applied for correlation analysis. Columns with near-zero variance are dropped because they cannot contribute meaningful directions. For missing cells, you can drop incomplete rows or impute with column means to preserve sample size.

Practical Reporting and Downloads

Use downloads to keep results reproducible. The CSV export includes component index, eigenvalue, explained variance percent, and cumulative percent, which can be plotted in spreadsheets. The PDF export places the scree chart and table on an A4 page for reports. Pair the suggested k values with domain constraints, then document your selection and threshold so future datasets can be compared consistently.

FAQs

1) What kind of CSV works best?

Use a rectangular numeric table: rows are observations and columns are variables. Keep units consistent per column, avoid text inside the data area, and include a header only if you enable the header option.

2) Why choose correlation instead of covariance?

Use correlation when variables have different units or ranges, because it standardizes each column. Use covariance when all variables share comparable scales and you want components influenced by absolute variability.

3) Why were some columns removed?

If a column has near-zero variance, it adds no meaningful direction and can cause numerical issues. The calculator removes such columns and notifies you in Notes.

4) How is the elbow point estimated?

Elbow detection looks for the largest curvature change using second differences across eigenvalues. It highlights where the curve transitions from steep decline to a flatter tail.

5) Can I limit how many components I see?

Set “Max Components” to show only the first k components in the plot and table. Computations still use all retained variables; the limit only affects display and exports.

6) Do eigenvalues change if I scale variables?

Yes. Scaling changes the variance structure, so eigenvalues differ between centered-only and z-scored data. If you select correlation, z-scoring is applied automatically to make results scale-free.

A	B	C	D
12	18	25	40
15	20	22	38
14	19	24	41
16	21	23	39
13	17	26	42
18	24	28	45
17	23	27	44
19	25	29	46
20	26	30	47
21	27	31	48

A	B	C	D
12	18	25	40
15	20	22	38
14	19	24	41
16	21	23	39
13	17	26	42
18	24	28	45
17	23	27	44
19	25	29	46
20	26	30	47
21	27	31	48