Calculator
White theme • Responsive grid • Full optionsTip: For best denoising, start with 90–98% variance and compare RMSE.
Example Data Table
This sample has three variables and ten observations.
| x1 | x2 | x3 |
|---|---|---|
| 2.5 | 2.4 | 1.2 |
| 0.5 | 0.7 | 0.3 |
| 2.2 | 2.9 | 1.1 |
| 1.9 | 2.2 | 0.9 |
| 3.1 | 3.0 | 1.4 |
Formula Used
- Center/scale: Z = (X − μ) / σ (or σ = 1 if not standardizing).
- Covariance: C = (1/(m−1)) · ZᵀZ.
- PCA: eigenpairs C vᵢ = λᵢ vᵢ, sorted by λᵢ.
- Projection: scores T = Z Vₖ.
- Reconstruction: Ẑ = T Vₖᵀ, then X̂ = Ẑσ + μ.
- Explained variance: Σ₁..k λᵢ / Σ₁..n λᵢ.
This page uses power iteration with deflation to estimate top components.
How to Use This Calculator
- Paste your dataset as rows and columns of numbers.
- Select the correct delimiter and whether a header exists.
- Enable standardization when features use different units.
- Choose “Fixed k” or “Target variance %” for components.
- Click submit, then compare variance and residual RMSE.
- Download the denoised output as CSV or PDF.
FAQs
1) What does PCA noise reduction mean?
It rebuilds your dataset using only the strongest principal components. Weak components often represent noise, so removing them can smooth measurements while keeping structure.
2) Should I standardize my data?
Yes when columns have different units or ranges. Standardization prevents large-scale variables from dominating components. If all variables share a common scale, centering alone can be enough.
3) How do I choose the number of components?
Start with a variance target like 95%. Then try slightly lower or higher targets and compare residual RMSE and downstream model performance. The best k depends on signal strength and goals.
4) What is explained variance here?
It is the fraction of total variance captured by the kept eigenvalues. Higher values preserve more information, but may retain more noise if the dataset is very noisy.
5) What does residual RMSE tell me?
RMSE measures the typical reconstruction difference between original and denoised values. Smaller RMSE indicates closer reconstruction. Larger RMSE can mean stronger denoising or weaker signal.
6) Can I use this for outlier removal?
PCA denoising is not a direct outlier detector. Strong outliers can skew components. If you expect outliers, consider cleaning them first or using robust preprocessing before PCA.
7) What happens with missing or non-numeric cells?
This tool fills those cells with the column mean before computing PCA. That keeps the matrix usable, but you should prefer proper imputation for serious analysis.
8) Is PCA denoising always safe?
Not always. If important information lives in low-variance directions, aggressive compression can remove it. Validate by checking domain signals, plots, or model metrics after denoising.