Calculator
Paste your data (CSV or similar). PCA works best with numeric columns and multiple rows.
Example data table
You can paste this sample to test the calculator.
| VarA | VarB | VarC |
|---|---|---|
| 2.5 | 2.4 | 1.2 |
| 0.5 | 0.7 | 0.3 |
| 2.2 | 2.9 | 1.4 |
| 1.9 | 2.2 | 1.0 |
| 3.1 | 3.0 | 1.8 |
Formula used
1) Center or standardize
For each variable: z = (x − μ) / σ. If standardization is off, we use σ = 1 (center only).
2) Covariance matrix
With centered/standardized matrix Z (n×p), covariance is C = (1/(n−1)) · ZᵀZ.
3) Eigen decomposition
Solve C v = λ v. Eigenvectors v are component loadings. Eigenvalues λ give explained variance.
4) Component scores
Scores are S = Z · Vₖ, projecting each observation onto the top k components.
How to use this calculator
- Paste your numeric dataset in the data box.
- Choose delimiter and whether headers exist.
- Pick missing-value handling and standardization.
- Set k, the number of components to keep.
- Press Submit and review summary, loadings, and scores.
FAQs
1) What does “standardize” change?
Standardizing divides each centered column by its standard deviation. This prevents large-scale variables from dominating the covariance matrix and usually produces more interpretable components.
2) How many components should I keep?
Common choices are the smallest k giving a high cumulative variance, or using a clear “elbow” in eigenvalues. Consider interpretability and your downstream modeling needs too.
3) What are loadings?
Loadings are the eigenvector values for each component. Large absolute loadings indicate variables that most influence the component’s direction.
4) What are scores?
Scores are new coordinates for each observation after projection onto the component axes. They are useful for visualization, clustering, and regression with reduced dimensions.
5) How are missing values handled?
You can replace missing entries with the column mean or drop any row containing missing values. Mean replacement keeps more rows but can reduce variance slightly.
6) Why do eigenvalues matter?
Eigenvalues measure how much variance each component explains. Dividing by the sum of all eigenvalues gives explained variance ratios and cumulative percentages.
7) Can I use non-numeric columns?
PCA requires numeric features. Convert categories to numeric encodings thoughtfully, or remove them. Also consider scaling choices before interpreting components.
8) Are the results identical to scientific libraries?
For symmetric covariance matrices, the Jacobi eigen-solver is accurate for typical calculator sizes. Very large or ill-conditioned datasets may show small numerical differences.