PCA Dimensionality Tool Calculator

PCA Input Panel

Paste numeric data as delimited rows. Keep columns consistent.

Dataset

Tip: Use comma, semicolon, tab, or space separators. Missing values: NA, null, ?, or blank.

Delimiter

Auto uses the first non-empty row.

Header row

If absent, Feature 1..n are used.

Missing values

Dropping is safer for small datasets.

Standardize inputs

Recommended when units differ.

Matrix type

Correlation emphasizes relationships over scale.

Score whitening

Divides each PC by √eigenvalue.

Component selection

Choose by variance or explicit count.

Target variance

ratio

Example: 0.95 keeps 95% of variance.

Fixed components (k)

PCs

Uses the top-k eigenvectors.

Preview rows

Controls how many reduced rows are shown.

Display decimals

Formatting only, not calculation precision.

Reset

Note: Large feature sets require more time due to eigen computations.

Example Data Table

This sample matches the default dataset in the textarea.

height	weight	age	income
170	68	29	52
165	61	34	48
180	75	26	60
175	72	31	58
160	55	41	45
185	80	24	65

Formula Used

1) Centering and optional standardization

For each feature j:
x′ = x − μ (centering), and optionally
z = (x − μ) / σ (z-scores).

2) Covariance / correlation matrix

With centered matrix X (n×p):
S = (1/(n−1)) · XᵀX
Correlation uses Sᵢⱼ / √(SᵢᵢSⱼⱼ).

3) Eigen-decomposition

Solve S vᵢ = λᵢ vᵢ.
Each λᵢ is a variance amount captured by PC i.

4) Projection to reduced space

For top k eigenvectors Vₖ:
Z = X Vₖ (scores).
Explained ratio: λᵢ / Σλ.

How to Use This Calculator

Paste your numeric dataset into the textarea.
Set delimiter, header row, and missing-value handling.
Choose whether to standardize features (recommended for mixed units).
Pick covariance or correlation matrix based on your goal.
Select components by target variance or fixed k.
Press Run PCA, then download CSV or PDF.

Dataset readiness and input validation

Reliable PCA starts with consistent, numeric columns. This tool accepts comma, semicolon, tab, or space separated rows and can auto-detect delimiters. It supports up to 5000 rows and 50 features, helping you test real datasets without overwhelming calculations. Use the header option to label features and make loadings easier to interpret. Remove identifiers and timestamps; keep only measured variables.

Standardization choices and matrix selection

When features use different units, standardization converts each column to z-scores, preventing large-scale variables from dominating the first component. Choose a covariance matrix to preserve original scale effects, or choose a correlation matrix to focus on relationships. For mixed-unit data, correlation plus standardization often yields more stable, comparable components. If your variables are already normalized, you can disable standardization to retain original variance patterns.

Variance accounting and component selection

PCA decomposes the symmetric matrix into eigenvalues and eigenvectors. Each eigenvalue represents variance captured by its component, and the sum of eigenvalues equals total variance in the matrix. The tool reports explained variance and cumulative variance, letting you select components by a target ratio such as 0.90 or 0.95. Fixed-k selection is available when a downstream model requires an exact dimensionality. A sharp drop in eigenvalues can also indicate a practical “elbow” for dimensionality reduction.

Interpreting loadings and score outputs

Loadings show how strongly each feature contributes to a component. Large positive or negative weights indicate influential directions in feature space. Scores are the transformed coordinates for each sample after projection onto the top components. Whitening optionally divides each score dimension by the square root of its eigenvalue, producing unit-variance components that can help distance-based methods. Component signs may flip without changing meaning, so compare magnitudes and feature groups rather than signs alone.

Export workflow and reporting

After computation, results appear immediately above the input panel for fast iteration. Download CSV to capture reduced scores for modeling, clustering, or visualization pipelines. Use the PDF option for quick sharing of eigenvalues and variance summaries in reviews, audits, and client reports. Combine the loadings table with domain context to create clear, defensible feature reduction decisions. Record selected k and target variance for reproducible results.

FAQs

1) Should I use covariance or correlation?

Use covariance when feature scales are meaningful and comparable. Use correlation when units differ or you want relationship-driven components. Correlation is commonly paired with standardization for mixed-unit datasets.

2) What does “standardize inputs” change?

Standardization centers each feature and divides by its sample standard deviation. This sets equal variance per feature, reducing scale bias so components reflect structure rather than measurement units.

3) How is the number of components chosen?

You can select a fixed k or use a target cumulative variance ratio, such as 0.95. The tool picks the smallest k that meets the target using the explained variance sequence.

4) What is whitening, and when is it useful?

Whitening scales each component score by dividing by √eigenvalue, making component variances closer to one. It can help with distance-based methods, but may reduce interpretability when comparing raw variance contributions.

5) How are missing values handled?

Choose to drop any row containing missing entries for cleaner mathematics, or fill missing entries using the column mean for faster retention of rows. If missing values remain, the tool will flag it.

6) What exactly is exported in the CSV file?

The CSV contains the reduced component scores for every cleaned row, labeled PC1..PCk. This is ready for modeling, plotting, or storing alongside your original identifiers in a separate table.

PCA Input Panel

Example Data Table

Formula Used

How to Use This Calculator

Dataset readiness and input validation

Standardization choices and matrix selection

Variance accounting and component selection

Interpreting loadings and score outputs

Export workflow and reporting

FAQs

1) Should I use covariance or correlation?

2) What does “standardize inputs” change?

3) How is the number of components chosen?

4) What is whitening, and when is it useful?

5) How are missing values handled?

6) What exactly is exported in the CSV file?

Related Calculators