PCA Calculator

Input

Paste numeric CSV. Rows are samples, columns are features.

Data matrix (CSV)

Limits: up to 500 rows and 20 columns.

Options

Standardize features (z-scores)

Recommended when features use different units.

Matrix type

Correlation forces standardization.

Components to extract (k)

k must be ≤ number of features.

Example dataset

Feature 1	Feature 2	Feature 3
2.5	1.3	0.8
3.1	1.8	1.2
2.9	1.5	1.0
4.2	2.1	1.7
3.8	2.0	1.5
4.6	2.5	2.0

Click “Load example data” to populate the textarea.

Reset

Data readiness and shape

This calculator treats each row as a sample and each column as a feature. It accepts numeric CSV with comma, semicolon, or tab separators, then validates a rectangular matrix before analysis. Practical limits are 500 samples and 20 features, keeping computations responsive in a browser-hosted workflow. Missing cells should be fixed in your dataset, because uneven row lengths stop processing and protect result integrity.

Centering and optional standardization

PCA starts by centering every feature using its mean, creating a zero‑mean matrix Z. When standardization is enabled, each centered value is divided by the sample standard deviation, turning features into z‑scores. This step is vital when units differ, such as kilograms, seconds, and dollars, because unscaled variance can dominate the first component. Correlation mode automatically forces standardization to reflect relative, unit‑free relationships.

Covariance, eigenvalues, and variance explained

After preprocessing, the calculator builds C = (1/(n−1))·ZᵀZ, a symmetric covariance matrix. It then finds eigenvalues (λ) and eigenvectors (v) of C, ordering them from largest to smallest. Each λ represents component variance, and the explained fraction is λ/Σλ. The cumulative column helps you select the smallest k that captures a target share, like 0.80 or 0.95, depending on your tolerance for information loss.

Loadings and score interpretation

Loadings are the eigenvector entries that show how strongly each feature contributes to a component. Large absolute values indicate influential variables, while mixed signs indicate opposing directions within the same component. Scores are computed as T = Z·Vₖ and represent each sample’s coordinates in reduced space, useful for clustering, outlier screening, and visual summaries. Component signs may flip across runs without changing meaning, so focus on relative patterns.

Exports and reproducible reporting

The CSV export captures the summary, variance table, loadings, and a score preview in a shareable format for spreadsheets and notebooks. The PDF export produces a compact one‑page snapshot suitable for quick documentation. For reproducible work, keep the original input matrix, record whether standardization was applied, and note the chosen k. These settings fully determine the covariance structure and the resulting principal components in this implementation. Use consistent feature order to compare projects.

FAQs

When should I standardize features?

Standardize when features use different units or scales, such as revenue and time. Z‑scores prevent high-variance columns from dominating PC1, making components reflect relationships rather than raw magnitude. Correlation mode enables this automatically.

What is the difference between covariance and correlation mode?

Covariance uses centered values in original units, so large-scale features can drive variance. Correlation mode standardizes first, then builds the same ZᵀZ/(n−1) matrix, producing unit‑free components better suited to mixed measurements.

How do I choose the number of components k?

Start with the cumulative explained variance table. Many workflows target 0.80–0.95 cumulative variance, then validate with your use case: visualization, compression, or modeling. Smaller k improves simplicity; larger k preserves detail.

Why can component signs flip between runs?

Eigenvectors are defined up to a sign, so multiplying a loading vector by −1 produces the same solution. Scores flip signs too, but distances and variance explained are unchanged. Interpret components by relative loadings and sample patterns.

How does the calculator handle constant columns?

A constant feature has near‑zero variance, so its standard deviation becomes zero. The calculator protects stability by treating that standard deviation as 1 during scaling, keeping values centered without creating division errors. Such features typically contribute little to PCA.

What data size works best in this tool?

It is designed for quick, interactive analysis: up to 500 rows and 20 columns. PCA works well with n greater than p, but it can also run when p is larger, provided the limits are respected and data are numeric.

Formula used

This calculator follows the standard PCA workflow for a data matrix X with n rows and p columns.

Centering: \( Z_{ij} = X_{ij} - \mu_j \)
Standardization (optional): \( Z_{ij} = (X_{ij} - \mu_j) / s_j \)
Covariance: \( C = \frac{1}{n-1} Z^T Z \)
Eigen decomposition: \( C v_k = \lambda_k v_k \)
Scores: \( T = Z V_k \) where columns of \(V_k\) are top eigenvectors
Explained variance: \( \lambda_k / \sum_i \lambda_i \)

How to use this calculator

Paste a numeric CSV matrix. Each row is a sample.
Choose covariance for same-unit features, correlation for mixed units.
Enable standardization when feature scales differ.
Set components k or leave it blank for a default.
Click “Calculate PCA” and review variance, loadings, and scores.
Use CSV or PDF downloads for reports and further analysis.