PCA Input
Example Data Table
Three variables across five observations. Paste this into the input box.
| Obs | X1 | X2 | X3 |
|---|---|---|---|
| 1 | 2.5 | 2.4 | 0.1 |
| 2 | 0.5 | 0.7 | -1.2 |
| 3 | 2.2 | 2.9 | 0.3 |
| 4 | 1.9 | 2.2 | 0.0 |
| 5 | 3.1 | 3.0 | 0.8 |
Formula Used
- Standardize: Zij = (xij − μj) / sj with optional centering and scaling.
- Covariance: C = (1/(n−1)) · ZᵀZ.
- Eigen decomposition: Cvk = λkvk where vk are loadings.
- Scores: T = ZVk gives projected coordinates on components.
- Explained variance ratio: EVRk = λk / tr(C).
This tool estimates the top components using power iteration with deflation.
How to Use This Calculator
- Paste your numeric table into the input area.
- Enable centering for mean removal across variables.
- Enable scaling when variables have different units.
- Choose the number of components to extract.
- Run PCA and review variance, loadings, and scores.
- Use CSV or PDF exports for reports and pipelines.
FAQs
1) What does PCA do?
PCA transforms correlated variables into orthogonal components. Each component captures a decreasing amount of total variance. The first components often summarize structure with fewer dimensions.
2) Should I center my data?
Yes in most cases. Centering removes the mean so covariance reflects variation, not offsets. Without centering, the first component can be dominated by mean shifts.
3) When should I scale variables?
Scale when variables use different units or ranges. Scaling to unit variance prevents large-scale variables from dominating components. If variables share a common scale, scaling may be unnecessary.
4) How many components should I keep?
Common rules include keeping enough components to reach 80–95% cumulative explained variance. You can also inspect a scree pattern in eigenvalues and stop when gains become small.
5) What are loadings and scores?
Loadings are the component directions over variables, shown as eigenvectors. Scores are the projected coordinates for each observation. High absolute loadings indicate strong variable influence on a component.
6) Why do my results differ from other tools?
Differences usually come from scaling choices, centering choices, or numerical methods. Some tools use SVD on the data matrix, while this tool uses covariance eigenpairs and iterative estimation.
7) Is this suitable for very large datasets?
It works well for moderate variable counts. For thousands of variables, iterative methods may be slower and memory-heavy. Consider dimensionality reduction with randomized methods in specialized analytics stacks.