Upload data, choose scaling, and pick components fast. View eigenvalues, scree insights, and variable influence. Download results, reuse templates, and compare scenarios confidently today.
Paste CSV data or upload a file, then choose preprocessing and output settings.
| Height | Weight | Age | Income |
|---|---|---|---|
| 170 | 65 | 29 | 48000 |
| 182 | 82 | 35 | 62000 |
| 165 | 55 | 22 | 39000 |
| 176 | 74 | 31 | 54000 |
| 158 | 50 | 26 | 41000 |
| 190 | 90 | 40 | 72000 |
| 172 | 68 | 28 | 50000 |
| 168 | 60 | 24 | 43000 |
Principal Component Analysis transforms correlated variables into orthogonal components that explain variance efficiently.
Given data matrix X (n×p)
1) Center (and optionally scale):
Xc = (X - μ) or Xz = (X - μ) / σ
2) Covariance matrix:
C = (1/(n-1)) · Xcᵀ · Xc
3) Eigen-decomposition:
C · vᵢ = λᵢ · vᵢ
4) Scores (projected data) for k components:
T = Xc · Vₖ
Explained variance ratio:
rᵢ = λᵢ / Σⱼ λⱼ
PCA summarizes many correlated variables into a few uncorrelated components. This tool reports eigenvalues, explained variance, loadings, and projected scores so you can reduce dimensionality without guessing. In many business datasets, the first 2–3 components often capture 60–85% of total variance after scaling, enabling faster modeling and clearer plots overall.
Mean-centering is essential because PCA is variance-driven. Z-score scaling is recommended when variables use different units (for example, income and age), because it prevents large-scale columns from dominating the covariance matrix. As a rule of thumb, aim for n larger than p (often 5–10×p) to stabilize covariance estimates. For missing data, mean imputation keeps sample size stable, while row dropping preserves original values but can reduce n and stability.
Each eigenvalue λ indicates how much variance its component explains. The table shows explained % and cumulative % so you can pick k objectively. A common target is 70–90% cumulative variance for compact representations, depending on the cost of information loss. The scree “elbow” (a sharp flattening of eigenvalues) is another useful cue. For standardized inputs, the Kaiser rule (λ > 1) is a quick screening idea, but the cumulative curve is usually a better decision signal.
Loadings are the eigenvector weights for each variable in a component. Larger absolute loadings mean stronger influence. As a practical threshold, |loading| ≥ 0.40 is often considered meaningful, while values near 0 suggest weak contribution. Squared loadings approximate how much of a variable’s variance is associated with a component, helping you label components with interpretable themes. Opposite signs indicate variables move in different directions along that component.
Scores are the transformed coordinates of each row: T = X·Vk. Use PC1 vs PC2 scatterplots to spot clusters, trends, and outliers, or feed the first k scores into regression and classification models. Because components are orthogonal, multicollinearity is reduced and coefficient estimates are typically more stable. You can also approximate the original data using X̂ ≈ T·Vkᵀ and assess reconstruction error when comparing k values.
Provide a CSV table where each row is an observation and each column is a numeric variable. Use the correct delimiter, and optionally include a header row for variable names.
Choose Z-score scaling when variables have different units or ranges, such as income, age, and measurements together. Scaling prevents one high-variance column from dominating the components.
Select mean imputation to replace missing cells with the column mean, keeping more rows. Choose row dropping to remove any record with missing data for a stricter, but smaller, dataset.
Use the explained variance table and pick the smallest k that reaches your target cumulative percentage, commonly 70–90%. Also look for a scree elbow where eigenvalues begin to flatten.
Loadings are weights that define each component direction. Variables with the same sign move together along that component, while opposite signs indicate trade-offs. Larger absolute values signal stronger influence.
Use scores as compact features for visualization, clustering, or predictive models. Because scores are orthogonal, they often reduce multicollinearity and improve stability compared with using many correlated original variables.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.