Calculator
Paste values, tune options, and compute PCA metrics with audit-friendly outputs.
Example Data Table
| X1 | X2 | X3 | X4 |
|---|---|---|---|
| 2.5 | 2.4 | 1.2 | 0.8 |
| 0.5 | 0.7 | 0.3 | 0.2 |
| 2.2 | 2.9 | 1.1 | 0.9 |
| 1.9 | 2.2 | 1.0 | 0.7 |
| 3.1 | 3.0 | 1.4 | 1.0 |
| 2.3 | 2.7 | 1.2 | 0.9 |
Formula Used
- Center: x' = x − μ
- Z-score: z = (x − μ) / σ
- Covariance: S = (1/(n−1)) XᵀX
- Eigen: S v = λ v
- Explained: EVRᵢ = λᵢ / Σλ
- Scores: T = X Vₖ
- Rebuild: X̂ = T Vₖᵀ
- SPE: ||x − x̂||² per row
- T²: Σ (tᵢ² / λᵢ) over selected PCs
How to Use This Calculator
- Paste a numeric table or upload a CSV file.
- Choose delimiter and confirm the header setting.
- Select scaling: z-score for mixed units, or center only.
- Pick missing-value handling, then choose k or threshold.
- Submit to view variance, loadings, and diagnostics.
- Download CSV or PDF to archive your results.
Variance explained and selection targets
Explained variance ratio (EVR) converts each eigenvalue into a share of total variability. Use the cumulative EVR curve to choose k components that meet a practical target, such as 0.80 for screening, 0.90 for modeling, or 0.95 for compression. This tool supports both fixed k and threshold selection, so you can align dimensionality with project constraints. For example, if PC1=45% and PC2=22%, k=2 covers 67% variation.
Interpreting eigenvalues and Kaiser guidance
Eigenvalues quantify how much variance a principal component carries in the processed space. When you standardize with z-scores, the average eigenvalue is 1.0, so the Kaiser rule suggests retaining components with λ > 1 as a quick baseline. Pair this with a scree-style drop-off: a steep first decline followed by a flatter tail often indicates the point of diminishing returns. In small samples, treat Kaiser as guidance, not a rule, and validate with domain knowledge.
Loadings, communalities, and variable influence
Loadings connect variables to components. Large absolute loadings imply strong contribution, while the sign indicates direction along the axis. The calculator also reports communalities, the sum of squared loadings across retained components for each variable. Communality near 1.0 means the reduced space preserves that variable well; values below about 0.50 signal information loss and may justify keeping more components. Review cross-loadings: when a variable loads strongly on multiple PCs, interpretation should emphasize patterns, not labels.
Scores, Hotelling’s T², and SPE residuals
Scores are the transformed coordinates of each row after projection onto Vₖ. Hotelling’s T² summarizes leverage within the retained subspace by weighting squared scores with 1/λ. SPE (squared prediction error) measures residual energy outside the subspace using ||x − x̂||². High T² can flag extreme but well-explained observations, while high SPE often points to structure not captured by the chosen components.
Quality checks and reporting workflow
Before publishing results, confirm scaling matches your units and noise level. Center-only PCA preserves original variance scales; z-score PCA equalizes units and is typical for mixed measures. Check the condition number for numerical stability and watch for near-zero variance variables. Export CSV for audits and PDF for stakeholders, keeping the options and thresholds documented alongside your dataset.
FAQs
Should I choose z-score or center-only scaling?
Use z-score when variables have different units or spreads, so each feature contributes comparably. Use center-only when variances are meaningful and measured on the same scale, such as repeated sensors calibrated identically.
What does a negative loading mean?
A negative loading means the variable increases as the component score decreases, relative to centered data. Magnitude matters more than sign for contribution; sign mainly affects interpretation of the axis direction.
How many rows do I need for stable PCA?
Stability improves with sample size and correlation strength. A practical starting point is at least 5–10 rows per variable, then validate by rerunning on bootstrap samples or split subsets to see if loadings stay consistent.
Why are my eigenvalues all close to zero?
This usually happens when columns are constant after preprocessing, or when nearly identical columns cancel variance. Remove zero-variance variables, confirm your data is numeric, and ensure you have at least two non-identical rows.
What are T² and SPE used for?
T² summarizes how far a row sits within the retained component space, highlighting leverage. SPE measures residual distance outside that space, highlighting patterns the retained PCs cannot reconstruct. Together they help screen outliers and drift.
Do downloads include my raw dataset?
Downloads include your computed metrics, loadings, and summaries. They do not automatically export every raw row, which protects large datasets. If you need full exports, paste only the rows you want or add them to your own report.