PCA Input Panel
Example Data Table
This sample matches the default dataset in the textarea.
| height | weight | age | income |
|---|---|---|---|
| 170 | 68 | 29 | 52 |
| 165 | 61 | 34 | 48 |
| 180 | 75 | 26 | 60 |
| 175 | 72 | 31 | 58 |
| 160 | 55 | 41 | 45 |
| 185 | 80 | 24 | 65 |
Formula Used
x′ = x − μ (centering), and optionally
z = (x − μ) / σ (z-scores).
S = (1/(n−1)) · XᵀX
Correlation uses Sᵢⱼ / √(SᵢᵢSⱼⱼ).
Each λᵢ is a variance amount captured by PC i.
Z = X Vₖ (scores).
Explained ratio: λᵢ / Σλ.
How to Use This Calculator
- Paste your numeric dataset into the textarea.
- Set delimiter, header row, and missing-value handling.
- Choose whether to standardize features (recommended for mixed units).
- Pick covariance or correlation matrix based on your goal.
- Select components by target variance or fixed k.
- Press Run PCA, then download CSV or PDF.
Dataset readiness and input validation
Reliable PCA starts with consistent, numeric columns. This tool accepts comma, semicolon, tab, or space separated rows and can auto-detect delimiters. It supports up to 5000 rows and 50 features, helping you test real datasets without overwhelming calculations. Use the header option to label features and make loadings easier to interpret. Remove identifiers and timestamps; keep only measured variables.
Standardization choices and matrix selection
When features use different units, standardization converts each column to z-scores, preventing large-scale variables from dominating the first component. Choose a covariance matrix to preserve original scale effects, or choose a correlation matrix to focus on relationships. For mixed-unit data, correlation plus standardization often yields more stable, comparable components. If your variables are already normalized, you can disable standardization to retain original variance patterns.
Variance accounting and component selection
PCA decomposes the symmetric matrix into eigenvalues and eigenvectors. Each eigenvalue represents variance captured by its component, and the sum of eigenvalues equals total variance in the matrix. The tool reports explained variance and cumulative variance, letting you select components by a target ratio such as 0.90 or 0.95. Fixed-k selection is available when a downstream model requires an exact dimensionality. A sharp drop in eigenvalues can also indicate a practical “elbow” for dimensionality reduction.
Interpreting loadings and score outputs
Loadings show how strongly each feature contributes to a component. Large positive or negative weights indicate influential directions in feature space. Scores are the transformed coordinates for each sample after projection onto the top components. Whitening optionally divides each score dimension by the square root of its eigenvalue, producing unit-variance components that can help distance-based methods. Component signs may flip without changing meaning, so compare magnitudes and feature groups rather than signs alone.
Export workflow and reporting
After computation, results appear immediately above the input panel for fast iteration. Download CSV to capture reduced scores for modeling, clustering, or visualization pipelines. Use the PDF option for quick sharing of eigenvalues and variance summaries in reviews, audits, and client reports. Combine the loadings table with domain context to create clear, defensible feature reduction decisions. Record selected k and target variance for reproducible results.
FAQs
1) Should I use covariance or correlation?
Use covariance when feature scales are meaningful and comparable. Use correlation when units differ or you want relationship-driven components. Correlation is commonly paired with standardization for mixed-unit datasets.
2) What does “standardize inputs” change?
Standardization centers each feature and divides by its sample standard deviation. This sets equal variance per feature, reducing scale bias so components reflect structure rather than measurement units.
3) How is the number of components chosen?
You can select a fixed k or use a target cumulative variance ratio, such as 0.95. The tool picks the smallest k that meets the target using the explained variance sequence.
4) What is whitening, and when is it useful?
Whitening scales each component score by dividing by √eigenvalue, making component variances closer to one. It can help with distance-based methods, but may reduce interpretability when comparing raw variance contributions.
5) How are missing values handled?
Choose to drop any row containing missing entries for cleaner mathematics, or fill missing entries using the column mean for faster retention of rows. If missing values remain, the tool will flag it.
6) What exactly is exported in the CSV file?
The CSV contains the reduced component scores for every cleaned row, labeled PC1..PCk. This is ready for modeling, plotting, or storing alongside your original identifiers in a separate table.