Calculator
Example Data Table
You can paste similar data into the calculator. Each column is a variable, each row is an observation.
| Height | Weight | Waist | Hip |
|---|---|---|---|
| 170 | 70 | 82 | 98 |
| 165 | 62 | 75 | 92 |
| 180 | 85 | 90 | 105 |
| 175 | 78 | 86 | 101 |
| 160 | 55 | 70 | 88 |
Formula Used
1) Standardization (optional)
Each value is centered, and optionally scaled:
μj is the column mean, sj is the sample standard deviation.
2) Covariance or correlation matrix
With centered data X, the sample covariance matrix is:
If you choose correlation, variables are standardized first.
3) Eigen decomposition
PCA solves:
λk are eigenvalues, vk are eigenvectors.
4) Loadings and factor scores
Loadings for component k are:
Component scores are:
Regression scores are:
Rotation (Varimax) multiplies loadings and scores by an orthogonal matrix.
How to Use This Calculator
- Paste your dataset as CSV, or upload a file.
- Choose whether the first row contains variable names.
- Pick how to handle missing or non-numeric entries.
- Select correlation for mixed units, covariance for same units.
- Set the number of components you want to keep.
- Choose a scoring method, then enable rotation if needed.
- Press Submit, then download CSV or PDF outputs.
When to Use PCA Scores
PCA factor scores summarize many correlated variables into a few independent dimensions. Use them when your dataset has multicollinearity, you need compact predictors, or you want clearer clusters. Each score is a weighted combination of the original variables, so you can rank observations, compare groups, and feed the scores into regression, classification, or segmentation workflows. For example, replacing ten variables with two scores can reduce model noise and speed cross validation.
Preparing Data for Stable Components
Reliable components start with consistent measurement and clean inputs. This calculator accepts a numeric table where rows are observations and columns are variables. If units differ, select the correlation option so variables are standardized to z-scores. If units match, covariance preserves real scale. Handle missing entries by dropping rows for strict integrity, or imputing column means for continuity. Aim for at least five to ten observations per variable, and check for extreme outliers that can dominate covariance.
Interpreting Eigenvalues and Loadings
Eigenvalues quantify how much variance each component explains, and their sum equals total variance of the selected matrix. The explained-variance table reports percent and cumulative percent, helping you decide how many components to keep. Loadings show how strongly each variable contributes to each component. Larger absolute loadings indicate stronger influence, while the sign shows direction of association within the component. A common rule keeps eigenvalues above one, then checks a scree break.
Choosing a Scoring Method
Component scores use the simple projection T = X·V, where V contains eigenvectors. Regression scores use T = X·S⁻¹·L, which can better approximate common factor scores but requires matrix inversion. If variables are highly redundant, inversion may be unstable. The ridge option adds a small diagonal value, improving numerical stability without materially changing well-conditioned results.
Rotation and Reporting Outputs
Rotation improves interpretability by redistributing variance across retained components. With Varimax rotation, the solution stays orthogonal, but loadings tend to become more “simple,” with variables loading strongly on fewer components. Rotated scores are produced by the same rotation matrix applied to the unrotated scores. Export CSV for full row-level results, and export PDF for a concise, shareable summary.
FAQs
What data format should I paste?
Paste a comma-separated table where each column is a variable and each row is an observation. Use a header row for names, or disable headers to auto-name variables. Non-numeric cells are treated as missing.
Should I choose correlation or covariance?
Choose correlation when variables use different units or scales, because standardization equalizes influence. Choose covariance when variables share the same unit and scale is meaningful. The calculator can standardize automatically for correlation.
How many components should I keep?
Use the explained-variance table and keep components until cumulative variance is adequate for your goal, often 80% to 95%. You can also apply an eigenvalue-above-one rule and confirm with the scree pattern.
Why do regression scores need a ridge value?
Regression scoring inverts the covariance or correlation matrix. If variables are nearly redundant, the matrix can be ill-conditioned. A small ridge adds stability by increasing diagonal values slightly, reducing numerical errors during inversion.
What does Varimax rotation change?
Varimax rotates the retained components without changing total explained variance within that subspace. Loadings often become easier to interpret, with clearer variable-to-component relationships. Scores are rotated by the same orthogonal matrix.
Do downloads include my original data?
Yes. The CSV export appends the computed scores to each original row. The PDF provides a compact report with variance and a preview of scores. If rotation is enabled, rotated scores are exported.