Calculator Inputs
Example data table
| FeatureA | FeatureB | FeatureC | FeatureD |
|---|---|---|---|
| 12 | 7 | 3 | 11 |
| 9 | 6 | 2 | 10 |
| 11 | 8 | 4 | 12 |
| 10 | 7 | 3 | 9 |
| 13 | 9 | 5 | 13 |
| 8 | 5 | 2 | 8 |
How to use this calculator
- Select Raw data or Matrix input.
- For raw data, choose whether to standardize.
- Set K to the number of PCs to summarize.
- Pick an importance method, then press Submit.
- Review ranking, chart, and explained variance table above.
- Use Download CSV/PDF for sharing and records.
Formula used
- Build covariance matrix S (or correlation matrix when standardized).
- Compute eigenpairs S v_k = λ_k v_k with λ_1 ≥ λ_2 ≥ ….
- Explained variance ratio: EVR_k = λ_k / Σ_i λ_i.
- Loadings (covariance PCA): L_{j,k} = v_{j,k} √λ_k.
- Weighted squared loadings: I_j = Σ_{k=1..K} (L_{j,k}² · EVR_k).
- Weighted absolute loadings: I_j = Σ_{k=1..K} (|L_{j,k}| · EVR_k).
- Communality: I_j = Σ_{k=1..K} L_{j,k}².
- Normalized percent: P_j = 100 · I_j / Σ_i I_i.
Why variance-based ranking matters
PCA feature importance summarizes how strongly each variable contributes to the variance captured by leading components. In practice, teams often reduce 30–200 columns to a shortlist. If the first two components explain 65% of total variance, features dominating those components usually drive the most visible structure in the data. Often, 5–10 features account for about half of normalized importance. Use the ranking as a screening metric, then confirm with domain logic and downstream models.
Choosing standardization for mixed units
When variables use different scales (for example, revenue in millions and defects in units), unscaled covariance can let large-magnitude features dominate. Z-score standardization converts each feature to mean 0 and standard deviation 1, turning the analysis into a correlation-based PCA. For operational dashboards, this frequently changes the top-three features by 20–40 percentage points of normalized importance. If your pipeline already scales inputs, keep standardization off to avoid double-scaling.
Selecting K with explained variance targets
K controls how much structure you summarize. Many analytics teams start with K where cumulative explained variance reaches 70%–90%. If eigenvalues drop sharply after PC3, setting K=3 may preserve signal while avoiding noise. This calculator reports eigenvalues, explained variance, and cumulative totals so you can justify K in a review memo with clear numbers.
Interpreting loadings and stability checks
Loadings indicate how features combine to form components; squared loadings relate to variance contribution. The weighted squared loading method multiplies each squared loading by the component’s variance share, which stabilizes rankings when higher PCs explain little variance. Component signs can flip across runs, but squared terms keep importance consistent. For reliability, rerun PCA after removing obvious outliers and compare the top ten features; a Spearman rank correlation above 0.8 suggests stable ordering.
Export-ready outputs for governance
Analytics decisions often require traceability. The CSV export preserves feature names, raw scores, normalized percentages, and the top contributing component, enabling quick replication in spreadsheets. The PDF export produces a compact record suitable for approvals. Pair the exported ranking with a short note on input mode, standardization choice, and the K threshold used. Supports audit-ready documentation companywide.
FAQs
It measures how much each variable contributes to variance captured by the first K components, based on component loadings and explained variance. It is a variance-based relevance score, not a causal effect size.
Standardize when features have different units or scales. It makes PCA depend on correlations rather than raw covariances. If your data pipeline already uses z-scores or similar scaling, leave standardization off here.
Pick K using the cumulative explained variance table. Common targets are 70%–90%. Use the elbow where eigenvalues drop quickly. Smaller K keeps the ranking focused on dominant structure.
Top PC indicates the component that contributes most to a feature’s importance within the selected K. It helps you interpret whether a feature mainly loads on early, high-variance components or later components.
No. The default method uses squared loadings and variance weights, which are non‑negative. If you see sign changes in loadings between runs, the squared metrics keep importance comparable.
Check stability by re-running after removing outliers or using a different time window. Compare top features and compute a rank correlation. Also confirm with domain expectations and performance in downstream models.