Calculator
Example data table
1 2 3 4 5 6
1 2 3 4 5 7
| What changes | Expected effect | Why it matters |
|---|---|---|
| One entry differs (6 → 7) | Match percent drops (tolerance 0) | Element-wise check is strict by default. |
| Overall scale stays similar | Cosine remains high | Cosine compares direction, not magnitude. |
| Distance increases slightly | Frobenius similarity decreases | Distance-based score reflects absolute differences. |
Formula used
Match% = (matches / total elements) × 100.
cos(a,b) = (a·b) / (‖a‖₂ ‖b‖₂), where ‖a‖₂ = √(Σ aᵢ²).
It measures linear alignment after centering by the mean.
Similarity = 1 − ‖A − B‖_F / (‖A‖_F + ‖B‖_F).
Similarity = 1 / (1 + MAE). This keeps scores in (0,1] and penalizes larger errors.
How to use this calculator
- Enter Matrix A and Matrix B with the same dimensions.
- Use spaces or commas between values, and one row per line.
- Set a tolerance if you want near-equal entries to match.
- Press Check similarity to compute multiple scores.
- Download your results as CSV or PDF for sharing.
Data quality and preprocessing
Similarity checks start with clean matrices. Ensure consistent units, scaling, and ordering before comparing. Small input issues can dominate metrics, especially when values are sparse or near zero. Normalize rows or columns when magnitude differences are irrelevant, and document any transformation you apply.
Check dimensions early; similarity requires identical shapes. If matrices represent time steps or categories, align labels before pasting. Replace missing entries with explicit zeros only when zeros are meaningful. Otherwise, impute or remove incomplete rows to avoid misleading comparisons in the same coordinate system.
Interpreting element match with tolerance
Element match reports the share of entries where the absolute difference is within a user chosen tolerance. In measurement or rounding workflows, a tolerance reflects instrument precision or acceptable numeric drift. A 98% match with a tolerance of 0.01 often indicates stable pipelines, while a low match signals localized changes.
Directional similarity using cosine
Cosine similarity evaluates whether two flattened matrices point in the same direction in a high dimensional space. It is robust when scale differs but patterns align. For example, doubling every entry keeps cosine near one. Use cosine to validate trend consistency in feature matrices, embeddings, or standardized score grids.
Linear relationship using correlation
Pearson correlation measures how strongly entries co move after subtracting their means. It highlights proportional relationships even when offsets exist. A correlation near one suggests the matrices vary together, while a negative value indicates inverse movement. When one matrix is constant, correlation becomes uninformative and should be interpreted cautiously.
Distance and error summaries for decisions
Frobenius distance aggregates squared differences across all cells, producing a single magnitude of change. The calculator converts it into a bounded similarity to aid comparison across sizes. MAE and RMSE add error summaries, where RMSE penalizes larger deviations. Combine these metrics: use match for strict compliance, cosine for shape, correlation for co movement, and distance for absolute drift.
For monitoring, set thresholds per metric. Many teams alert when Frobenius similarity falls below a baseline, then inspect the heatmap of differences offline. Record inputs, tolerance, and chosen metric in reports so results are reproducible across runs and reviewers over time and across environments.
FAQs
Which similarity score should I trust most?
Use match percent for strict equality, cosine for pattern direction, correlation for linear co movement, and Frobenius similarity for absolute drift. Many projects track two metrics together to balance scale and shape.
How do I choose a good tolerance?
Start with expected rounding or measurement error. If inputs are integers, tolerance 0 is typical. For sensor or computed data, set tolerance to a small value that reflects acceptable deviation per cell in your domain.
Why do the matrices need the same size?
These metrics compare entries position by position after flattening. Different dimensions mix unrelated values and break alignment. If your data differs in size, resample or pad intentionally, then document the mapping before comparing.
What happens if one matrix is all zeros?
Cosine and correlation become unstable when norms or variance are zero. The calculator handles this by returning safe defaults, but interpretation is limited. Prefer distance and match percent when one matrix lacks variation.
Can similarity be negative?
Yes. Cosine and correlation can be negative when patterns oppose each other. Negative values indicate strong disagreement in direction or linear relationship. The percent column maps them onto a 0–100 scale for readability.
What do the CSV and PDF exports contain?
Exports include each metric, its numeric score, the mapped percent, and notes. They also include dimensions, tolerance, MAE, RMSE, and Frobenius difference so you can reproduce the same results later.