Turn raw variables into clear, comparable component insights. Choose scaling, imputation, and component targets instantly. Download tables, plots, and summaries for reporting anywhere now.
| Height | Weight | Age | Income |
|---|---|---|---|
| 170 | 72 | 29 | 42000 |
| 165 | 65 | 35 | 52000 |
| 180 | 80 | 31 | 61000 |
| 175 | 77 | 28 | 48000 |
| 160 | 59 | 40 | 53000 |
Before running PCA, check row counts, outliers, and missingness. With limited samples, covariance estimates become noisy and components swing between runs. Mean imputation works when gaps are sparse and random, while dropping rows is safer when entire records are unreliable. If one variable contains many blanks, consider removing it or collecting more observations to protect interpretability. Aim for five to ten observations per variable when possible.
PCA maximizes variance, so units matter. Using z-scores gives every variable unit variance, letting structure reflect relationships rather than magnitude. Mean-centering keeps original scales, which is useful when all variables share a unit and variance itself is meaningful. No scaling is rarely recommended, because a single large-range feature can dominate the first component and hide subtler signals. Mixed percentages, counts, and currency almost always require z-scoring.
Each eigenvalue estimates the variance captured by a component. Divide eigenvalues by the covariance trace to get explained variance ratios, then sum them for a cumulative view. Practical workflows target 80–95% cumulative variance, balancing compression and fidelity. If cumulative variance rises slowly, the dataset may be weakly correlated, and dimensionality reduction will deliver limited simplification benefits. A scree elbow can confirm the cutoff, alongside a variance threshold rule.
Loadings indicate how strongly each variable contributes to a component direction. Large absolute loadings highlight drivers, while near-zero values indicate minimal influence. Signs can flip without changing interpretation, so focus on relative patterns. When two variables share the same sign on a component, they tend to move together in that direction; opposite signs suggest trade-offs. Squared loadings summed across kept components approximate each variable’s contribution to the retained space.
Scores are the coordinates of each observation in component space. Plotting scores or comparing their ranges can reveal clusters, trends, and anomalies across time, products, or cohorts. Because components are orthogonal, scores reduce multicollinearity in downstream models. For reporting, include the explained variance table, key loadings, and a small score preview to keep results actionable. Use reconstruction RMSE to quantify information loss from the chosen components.
Use numeric columns with one observation per row. Comma, tab, semicolon, or whitespace delimiters work. If you include a header row, tick the header option so variables are labeled correctly.
Z-score is best when variables use different units or ranges. Mean-centering is suitable when all variables share a unit and raw variance magnitudes are meaningful for your analysis.
Start with a cumulative explained variance target, such as 80–95%. If the curve has an elbow, keep components up to that point. The variance-threshold mode automates this by stopping once your target is reached.
Eigenvectors are direction choices: multiplying a component by −1 keeps the same subspace and scores just change sign. Interpret components by relative magnitudes and variable groupings, not by the sign alone.
Only if preprocessing and the learned component vectors are consistent. Different scaling, missing handling, or data distributions change the covariance matrix, so component directions shift. For true comparison, fit on a reference set and project new rows using the same settings.
It measures average reconstruction error after projecting into the kept components and transforming back in the processed space. Lower RMSE means the retained components preserve more structure from the original variables.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.