PCA Metrics Tool Calculator

Calculator

Paste values, tune options, and compute PCA metrics with audit-friendly outputs.

Upload CSV (optional)

If provided, upload overrides the text area.

Delimiter

Auto-detect delimiter from first line

Data structure

First row contains variable names

Show covariance and correlation matrices

Show scores, T², and SPE (top rows)

Scaling

Z-score is common when variables use different units.

Missing values

Drop is safest; imputation keeps sample size.

Component selection

Variance threshold (0.50–0.9999)

Jacobi tolerance

Smaller tolerances are more accurate but slower.

Max iterations

Increase if eigenvalues look unstable.

Max rows for score table

Only affects display, not computations.

Paste data

Tip: use blanks, NA, or null for missing values.

Example Data Table

X1	X2	X3	X4
2.5	2.4	1.2	0.8
0.5	0.7	0.3	0.2
2.2	2.9	1.1	0.9
1.9	2.2	1.0	0.7
3.1	3.0	1.4	1.0
2.3	2.7	1.2	0.9

This sample contains correlated features to make component structure visible.

Formula Used

Preprocessing

Center: x' = x − μ
Z-score: z = (x − μ) / σ
Covariance: S = (1/(n−1)) XᵀX

PCA Metrics

Eigen: S v = λ v
Explained: EVRᵢ = λᵢ / Σλ
Scores: T = X Vₖ
Rebuild: X̂ = T Vₖᵀ
SPE: ||x − x̂||² per row
T²: Σ (tᵢ² / λᵢ) over selected PCs

Eigenvalues and eigenvectors are computed using the Jacobi rotation method for symmetric matrices.

How to Use This Calculator

Paste a numeric table or upload a CSV file.
Choose delimiter and confirm the header setting.
Select scaling: z-score for mixed units, or center only.
Pick missing-value handling, then choose k or threshold.
Submit to view variance, loadings, and diagnostics.
Download CSV or PDF to archive your results.

Variance explained and selection targets

Explained variance ratio (EVR) converts each eigenvalue into a share of total variability. Use the cumulative EVR curve to choose k components that meet a practical target, such as 0.80 for screening, 0.90 for modeling, or 0.95 for compression. This tool supports both fixed k and threshold selection, so you can align dimensionality with project constraints. For example, if PC1=45% and PC2=22%, k=2 covers 67% variation.

Interpreting eigenvalues and Kaiser guidance

Eigenvalues quantify how much variance a principal component carries in the processed space. When you standardize with z-scores, the average eigenvalue is 1.0, so the Kaiser rule suggests retaining components with λ > 1 as a quick baseline. Pair this with a scree-style drop-off: a steep first decline followed by a flatter tail often indicates the point of diminishing returns. In small samples, treat Kaiser as guidance, not a rule, and validate with domain knowledge.

Loadings, communalities, and variable influence

Loadings connect variables to components. Large absolute loadings imply strong contribution, while the sign indicates direction along the axis. The calculator also reports communalities, the sum of squared loadings across retained components for each variable. Communality near 1.0 means the reduced space preserves that variable well; values below about 0.50 signal information loss and may justify keeping more components. Review cross-loadings: when a variable loads strongly on multiple PCs, interpretation should emphasize patterns, not labels.

Scores, Hotelling’s T², and SPE residuals

Scores are the transformed coordinates of each row after projection onto Vₖ. Hotelling’s T² summarizes leverage within the retained subspace by weighting squared scores with 1/λ. SPE (squared prediction error) measures residual energy outside the subspace using ||x − x̂||². High T² can flag extreme but well-explained observations, while high SPE often points to structure not captured by the chosen components.

Quality checks and reporting workflow

Before publishing results, confirm scaling matches your units and noise level. Center-only PCA preserves original variance scales; z-score PCA equalizes units and is typical for mixed measures. Check the condition number for numerical stability and watch for near-zero variance variables. Export CSV for audits and PDF for stakeholders, keeping the options and thresholds documented alongside your dataset.

FAQs

Should I choose z-score or center-only scaling?

Use z-score when variables have different units or spreads, so each feature contributes comparably. Use center-only when variances are meaningful and measured on the same scale, such as repeated sensors calibrated identically.

What does a negative loading mean?

A negative loading means the variable increases as the component score decreases, relative to centered data. Magnitude matters more than sign for contribution; sign mainly affects interpretation of the axis direction.

How many rows do I need for stable PCA?

Stability improves with sample size and correlation strength. A practical starting point is at least 5–10 rows per variable, then validate by rerunning on bootstrap samples or split subsets to see if loadings stay consistent.

Why are my eigenvalues all close to zero?

This usually happens when columns are constant after preprocessing, or when nearly identical columns cancel variance. Remove zero-variance variables, confirm your data is numeric, and ensure you have at least two non-identical rows.

What are T² and SPE used for?

T² summarizes how far a row sits within the retained component space, highlighting leverage. SPE measures residual distance outside that space, highlighting patterns the retained PCs cannot reconstruct. Together they help screen outliers and drift.

Do downloads include my raw dataset?

Downloads include your computed metrics, loadings, and summaries. They do not automatically export every raw row, which protects large datasets. If you need full exports, paste only the rows you want or add them to your own report.

X1	X2	X3	X4
2.5	2.4	1.2	0.8
0.5	0.7	0.3	0.2
2.2	2.9	1.1	0.9
1.9	2.2	1.0	0.7
3.1	3.0	1.4	1.0
2.3	2.7	1.2	0.9

X1	X2	X3	X4
2.5	2.4	1.2	0.8
0.5	0.7	0.3	0.2
2.2	2.9	1.1	0.9
1.9	2.2	1.0	0.7
3.1	3.0	1.4	1.0
2.3	2.7	1.2	0.9