PCA Factor Scores Calculator

Calculator

Data source

Upload a CSV, or paste data in the box.

Header row

Missing values

Matrix type

Correlation is typical for mixed scales.

Standardize variables (z-score)

If you choose correlation, standardization is enforced.

Number of components

Must be ≤ number of variables.

Score method

Regression may be sensitive to near-singular matrices.

Varimax rotation

Ridge for inversion

Adds a small value to the diagonal when inverting.

Paste CSV data

Results will appear above this form.

Example Data Table

You can paste similar data into the calculator. Each column is a variable, each row is an observation.

Height	Weight	Waist	Hip
170	70	82	98
165	62	75	92
180	85	90	105
175	78	86	101
160	55	70	88

Formula Used

1) Standardization (optional)

Each value is centered, and optionally scaled:

z_ij = (x_ij − μ_j) / s_j

μ_j is the column mean, s_j is the sample standard deviation.

2) Covariance or correlation matrix

With centered data X, the sample covariance matrix is:

S = (1 / (n − 1)) · XᵀX

If you choose correlation, variables are standardized first.

3) Eigen decomposition

PCA solves:

S v_k = λ_k v_k

λ_k are eigenvalues, v_k are eigenvectors.

4) Loadings and factor scores

Loadings for component k are:

L = V · diag(√λ)

Component scores are:

T = X · V

Regression scores are:

T = X · S⁻¹ · L

Rotation (Varimax) multiplies loadings and scores by an orthogonal matrix.

How to Use This Calculator

Paste your dataset as CSV, or upload a file.
Choose whether the first row contains variable names.
Pick how to handle missing or non-numeric entries.
Select correlation for mixed units, covariance for same units.
Set the number of components you want to keep.
Choose a scoring method, then enable rotation if needed.
Press Submit, then download CSV or PDF outputs.

When to Use PCA Scores

PCA factor scores summarize many correlated variables into a few independent dimensions. Use them when your dataset has multicollinearity, you need compact predictors, or you want clearer clusters. Each score is a weighted combination of the original variables, so you can rank observations, compare groups, and feed the scores into regression, classification, or segmentation workflows. For example, replacing ten variables with two scores can reduce model noise and speed cross validation.

Preparing Data for Stable Components

Reliable components start with consistent measurement and clean inputs. This calculator accepts a numeric table where rows are observations and columns are variables. If units differ, select the correlation option so variables are standardized to z-scores. If units match, covariance preserves real scale. Handle missing entries by dropping rows for strict integrity, or imputing column means for continuity. Aim for at least five to ten observations per variable, and check for extreme outliers that can dominate covariance.

Interpreting Eigenvalues and Loadings

Eigenvalues quantify how much variance each component explains, and their sum equals total variance of the selected matrix. The explained-variance table reports percent and cumulative percent, helping you decide how many components to keep. Loadings show how strongly each variable contributes to each component. Larger absolute loadings indicate stronger influence, while the sign shows direction of association within the component. A common rule keeps eigenvalues above one, then checks a scree break.

Choosing a Scoring Method

Component scores use the simple projection T = X·V, where V contains eigenvectors. Regression scores use T = X·S⁻¹·L, which can better approximate common factor scores but requires matrix inversion. If variables are highly redundant, inversion may be unstable. The ridge option adds a small diagonal value, improving numerical stability without materially changing well-conditioned results.

Rotation and Reporting Outputs

Rotation improves interpretability by redistributing variance across retained components. With Varimax rotation, the solution stays orthogonal, but loadings tend to become more “simple,” with variables loading strongly on fewer components. Rotated scores are produced by the same rotation matrix applied to the unrotated scores. Export CSV for full row-level results, and export PDF for a concise, shareable summary.

FAQs

What data format should I paste?

Paste a comma-separated table where each column is a variable and each row is an observation. Use a header row for names, or disable headers to auto-name variables. Non-numeric cells are treated as missing.

Should I choose correlation or covariance?

Choose correlation when variables use different units or scales, because standardization equalizes influence. Choose covariance when variables share the same unit and scale is meaningful. The calculator can standardize automatically for correlation.

How many components should I keep?

Use the explained-variance table and keep components until cumulative variance is adequate for your goal, often 80% to 95%. You can also apply an eigenvalue-above-one rule and confirm with the scree pattern.

Why do regression scores need a ridge value?

Regression scoring inverts the covariance or correlation matrix. If variables are nearly redundant, the matrix can be ill-conditioned. A small ridge adds stability by increasing diagonal values slightly, reducing numerical errors during inversion.

What does Varimax rotation change?

Varimax rotates the retained components without changing total explained variance within that subspace. Loadings often become easier to interpret, with clearer variable-to-component relationships. Scores are rotated by the same orthogonal matrix.

Do downloads include my original data?

Yes. The CSV export appends the computed scores to each original row. The PDF provides a compact report with variance and a preview of scores. If rotation is enabled, rotated scores are exported.