Upload data, pick options, and reveal hidden structure. Compare components, loadings, and variance in seconds. Download CSV and PDF, then share results confidently everywhere.
Paste numeric columns (comma, semicolon, or tab). Optionally include a label column for row names. For wide datasets, try selecting columns with a simple index list.
This sample includes a label column and four numeric measurements.
| Label | SepalLength | SepalWidth | PetalLength | PetalWidth |
|---|---|---|---|---|
| Iris-setosa-1 | 5.1 | 3.5 | 1.4 | 0.2 |
| Iris-setosa-2 | 4.9 | 3.0 | 1.4 | 0.2 |
| Iris-setosa-3 | 4.7 | 3.2 | 1.3 | 0.2 |
| Iris-versicolor-1 | 7.0 | 3.2 | 4.7 | 1.4 |
| Iris-virginica-1 | 6.3 | 3.3 | 6.0 | 2.5 |
PCA is sensitive to measurement units, missing values, and outliers. Use correlation when variables differ in scale, or enable z-score standardization so each feature has mean 0 and standard deviation 1. For mixed scales, standardization reduces dominance by high-variance variables. Remove constant columns, confirm numeric-only fields, and label rows with stable identifiers. If you must impute, mean imputation preserves row count but can shrink variance. Precheck per-feature standard deviation; near-zero values create unstable directions.
The scree curve summarizes diminishing returns across eigenvalues. Many applied projects retain components that explain 70–95% cumulative variance, depending on noise tolerance and interpretability needs. One heuristic keeps eigenvalues above 1 for standardized inputs, but validate against goals. Compare cumulative ratios, then test whether adding a component materially changes separation, cluster tightness, or anomaly visibility. If PC3 barely moves points, focus on PC1–PC2 for reporting.
Loadings are weights that define each component vector. Large absolute loadings indicate variables driving separation, while opposite signs imply tradeoffs between feature groups. In standardized mode, loadings reflect relative influence and are comparable across variables. Look for coherent sets of variables that move together, and confirm the direction using raw means. Unexpected dominant loadings can indicate leakage, duplicated columns, or preprocessing errors that inflate variance.
Score scatter plots reveal similarity among samples, potential clusters, and anomalies. A dense core with far points suggests outliers, mixed populations, or entry issues. The loadings compass adds explanation: samples moving toward an arrow usually have higher values for that feature. Use labels to validate patterns, compare runs with and without standardization, and verify that separation is not caused by a single extreme column.
Use score tables as compact inputs for segmentation, monitoring, or downstream models. Export loadings to document feature contributions and support stakeholder explanations. The PDF summary consolidates variance, loadings, and score previews for audits and handoffs. Re-run the analysis after data updates to track drift; changes in eigenvalues or loadings signal distribution shifts. Store the chosen settings alongside results for reproducibility.
Use numeric columns with consistent units per feature. Include at least two rows and one feature. Remove text fields, constant columns, and extreme outliers when possible. Add a label column for IDs if you want readable plots and exports.
Choose covariance when features share comparable units and variances matter. Choose correlation when units differ or one feature’s scale would dominate. Correlation mode standardizes first, so components reflect relative patterns rather than raw magnitude.
Eigenvectors can be multiplied by −1 without changing the solution. That sign flip mirrors the loadings and scores but preserves distances and variance. Interpret components by relationships and magnitudes, not by the sign alone.
Start with two for visualization. Then use cumulative explained variance to meet your target, such as 80% or 90%. If additional components barely change separation or interpretation, keep the smaller set for clarity.
You can drop rows with missing entries or apply mean imputation per feature. Dropping is safest when you have enough data. Mean imputation keeps row count but may reduce variance and soften separation in the score plot.
Scores CSV includes each row’s PC coordinates. Loadings CSV lists feature weights per component. Combined CSV merges original numeric inputs with scores. The PDF summary includes variance, loadings, and a preview of scores for reporting.
Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.