PCA Scores Viewer

Paste CSV data and explore principal component patterns. See explained variance, loadings, and scores quickly. Export results to share with your team easily now.

Calculator

Data input

Paste CSV (with or without header) or upload a file.
Missing values accepted: blank, NA, NaN, null.

Parsing & cleaning

Jacobi PCA is fastest with fewer features.

Output options

Use 2 for common scatter plots.
Labels are used in the scores table and plot.
Reset
Use numeric columns for PCA. Non-numeric columns can be labels. For best interpretation, standardize variables with different units.

Example data table

Sample Length Width Height Weight
A5.12.01.18.0
B4.92.21.07.8
C5.82.41.39.2
D6.22.51.49.8
This matches the built-in example. Click “Load example” to paste it.

Formula used

1) Standardization (optional)
For each feature x:
z = (x - μ) / σ
where μ is the column mean and σ is the sample standard deviation.
2) Covariance matrix
With centered/scaled data matrix Z (n×p):
S = (1/(n-1)) · ZᵀZ
If you choose “No scaling”, the tool still mean-centers before building S.
3) Eigen decomposition and scores
Solve S vᵢ = λᵢ vᵢ for eigenvalues λᵢ and eigenvectors vᵢ. Sort λ descending to define PC1, PC2, … . Scores for the first k components:
T = Z · Vₖ
where Vₖ is the p×k matrix of top eigenvectors and T is the n×k score matrix.

How to use this calculator

  1. Paste your CSV in the textbox or upload a CSV file.
  2. Confirm header and delimiter, then choose missing-value handling.
  3. Pick scaling (Z-score is recommended for mixed units).
  4. Select the number of components to compute (2 is common).
  5. Click “View PCA Scores” to see scores, variance, and loadings.
  6. Use the download buttons to export CSV and PDF.

PCA scores as coordinates

PCA scores place each row into a reduced component space. A score of +2.0 on PC1 means the row is two standardized units along the PC1 direction when Z‑score scaling is used. Because components are orthogonal, PC1 and PC2 share 0 covariance by construction, supporting separation diagnostics and comparisons.

Input structure and column selection

This viewer treats columns as features and rows as observations. At least 2 mostly numeric columns (≥90% numeric entries) are required. A useful rule is n ≥ 5p (better 10p) so the covariance estimate is stable. If n < p, some eigenvalues become near‑zero and minor PCs can be dominated by noise.

Scaling choices and their impact

Z‑score scaling sets each feature to mean 0 and standard deviation 1, preventing large‑unit variables from dominating. Mean‑centering alone keeps original units, so a variable with 10× larger variance can drive PC1. With Z‑scores, the covariance matrix behaves like a correlation matrix, making loadings easier to compare across features.

Handling missing values responsibly

Dropping rows preserves raw distributions but reduces n. If 5% values are missing across 200 rows, complete‑case deletion can remove far more than 5% rows when gaps occur in different columns. Mean imputation keeps n fixed but shrinks variance slightly and can pull extreme points inward; use it when missingness is low and roughly random.

Choosing the number of components

A practical target is 80–95% cumulative explained variance, depending on noise. If PC1 explains 62% and PC2 18%, the first two components capture 80% and a 2D plot is usually informative. When adding PCs, watch diminishing returns; a third component adding <5% often indicates marginal structure. Scree “elbows” are another cue for stopping. For reporting, list eigenvalues, explained %, and cumulative % for retained PCs, so reviewers can audit reduction quickly without re-running any code.

Reading clusters, outliers, and loadings

Scores cluster when rows share similar standardized profiles. Outliers often appear beyond ±3 on a major component, but remember PCA signs can flip without changing meaning. Loadings near ±0.70 indicate strong feature influence, while values near 0.10 are weak. If Hotelling’s T² is enabled, larger T² suggests multivariate distance from the center and can flag unusual combinations, not just extreme single features.

FAQs

1) Which delimiter formats are supported?

Auto-detect covers comma, semicolon, tab, and pipe. If your file uses a rare separator, replace it before uploading. Always verify that columns align correctly in the example preview.

2) Why should I use Z‑score scaling?

Use Z‑scores when variables have different units or spreads. It prevents a high-variance feature from dominating PC1 and makes loadings comparable. If all variables share a unit and scale, centering may be enough.

3) Why are my PCA scores negative?

Negative scores are normal. Components are directions through the centered data, so points can fall on either side of the origin. Only relative positions and distances matter, not the sign itself.

4) How does the tool choose numeric columns?

A column is treated as numeric when at least 90% of its non-missing cells parse as numbers. Other columns are ignored for PCA but can be used as labels in tables and plots.

5) What does Hotelling’s T² indicate?

T² summarizes multivariate distance using the retained components and their eigenvalues. Larger values suggest unusual overall profiles, even if no single feature is extreme. It is useful for screening potential outliers.

6) Why doesn’t the PDF include the interactive plot?

The PDF export focuses on reproducible tables: variance, scores, and loadings. Interactive plots are rendered in the browser. If you need a static figure, use your browser’s print-to-PDF or Plotly’s image export menu.

Related Calculators

PCA CalculatorPCA Data AnalyzerPCA Score CalculatorPCA Explained VariancePCA Component CalculatorPCA Eigenvalue ToolPCA Scree PlotPCA Factor ScoresPCA Dimensionality ToolPCA Feature Reducer

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.