Calculator
Example data table
| Sample | Length | Width | Height | Weight |
|---|---|---|---|---|
| A | 5.1 | 2.0 | 1.1 | 8.0 |
| B | 4.9 | 2.2 | 1.0 | 7.8 |
| C | 5.8 | 2.4 | 1.3 | 9.2 |
| D | 6.2 | 2.5 | 1.4 | 9.8 |
Formula used
How to use this calculator
- Paste your CSV in the textbox or upload a CSV file.
- Confirm header and delimiter, then choose missing-value handling.
- Pick scaling (Z-score is recommended for mixed units).
- Select the number of components to compute (2 is common).
- Click “View PCA Scores” to see scores, variance, and loadings.
- Use the download buttons to export CSV and PDF.
PCA scores as coordinates
PCA scores place each row into a reduced component space. A score of +2.0 on PC1 means the row is two standardized units along the PC1 direction when Z‑score scaling is used. Because components are orthogonal, PC1 and PC2 share 0 covariance by construction, supporting separation diagnostics and comparisons.
Input structure and column selection
This viewer treats columns as features and rows as observations. At least 2 mostly numeric columns (≥90% numeric entries) are required. A useful rule is n ≥ 5p (better 10p) so the covariance estimate is stable. If n < p, some eigenvalues become near‑zero and minor PCs can be dominated by noise.
Scaling choices and their impact
Z‑score scaling sets each feature to mean 0 and standard deviation 1, preventing large‑unit variables from dominating. Mean‑centering alone keeps original units, so a variable with 10× larger variance can drive PC1. With Z‑scores, the covariance matrix behaves like a correlation matrix, making loadings easier to compare across features.
Handling missing values responsibly
Dropping rows preserves raw distributions but reduces n. If 5% values are missing across 200 rows, complete‑case deletion can remove far more than 5% rows when gaps occur in different columns. Mean imputation keeps n fixed but shrinks variance slightly and can pull extreme points inward; use it when missingness is low and roughly random.
Choosing the number of components
A practical target is 80–95% cumulative explained variance, depending on noise. If PC1 explains 62% and PC2 18%, the first two components capture 80% and a 2D plot is usually informative. When adding PCs, watch diminishing returns; a third component adding <5% often indicates marginal structure. Scree “elbows” are another cue for stopping. For reporting, list eigenvalues, explained %, and cumulative % for retained PCs, so reviewers can audit reduction quickly without re-running any code.
Reading clusters, outliers, and loadings
Scores cluster when rows share similar standardized profiles. Outliers often appear beyond ±3 on a major component, but remember PCA signs can flip without changing meaning. Loadings near ±0.70 indicate strong feature influence, while values near 0.10 are weak. If Hotelling’s T² is enabled, larger T² suggests multivariate distance from the center and can flag unusual combinations, not just extreme single features.
FAQs
1) Which delimiter formats are supported?
Auto-detect covers comma, semicolon, tab, and pipe. If your file uses a rare separator, replace it before uploading. Always verify that columns align correctly in the example preview.
2) Why should I use Z‑score scaling?
Use Z‑scores when variables have different units or spreads. It prevents a high-variance feature from dominating PC1 and makes loadings comparable. If all variables share a unit and scale, centering may be enough.
3) Why are my PCA scores negative?
Negative scores are normal. Components are directions through the centered data, so points can fall on either side of the origin. Only relative positions and distances matter, not the sign itself.
4) How does the tool choose numeric columns?
A column is treated as numeric when at least 90% of its non-missing cells parse as numbers. Other columns are ignored for PCA but can be used as labels in tables and plots.
5) What does Hotelling’s T² indicate?
T² summarizes multivariate distance using the retained components and their eigenvalues. Larger values suggest unusual overall profiles, even if no single feature is extreme. It is useful for screening potential outliers.
6) Why doesn’t the PDF include the interactive plot?
The PDF export focuses on reproducible tables: variance, scores, and loadings. Interactive plots are rendered in the browser. If you need a static figure, use your browser’s print-to-PDF or Plotly’s image export menu.