Calculator inputs
Example data table
| Category | User A | User B | Weight |
|---|---|---|---|
| Action | 4.5 | 4.0 | 1.0 |
| Comedy | 3.0 | 2.5 | 1.0 |
| Drama | 5.0 | 4.5 | 1.2 |
| Sci-Fi | 2.0 | 2.0 | 0.8 |
| Documentary | 4.0 | 4.5 | 1.1 |
| Thriller | 3.5 | 3.0 | 1.0 |
Formula used
1) Weighted cosine similarity
sim = Σ(wᵢaᵢbᵢ) / ( √Σ(wᵢaᵢ²) × √Σ(wᵢbᵢ²) )
2) Weighted Pearson correlation
sim = Σ(wᵢ(aᵢ-ā)(bᵢ-b̄)) / ( √Σ(wᵢ(aᵢ-ā)²) × √Σ(wᵢ(bᵢ-b̄)²) )
3) Euclidean-based similarity
sim = 1 / ( 1 + √Σ(wᵢ(aᵢ-bᵢ)²) )
4) Manhattan-based similarity
sim = 1 / ( 1 + Σ(wᵢ|aᵢ-bᵢ|) )
5) Weighted Jaccard similarity
sim = Σ(wᵢmin(xᵢ,yᵢ)) / Σ(wᵢmax(xᵢ,yᵢ))
Here xᵢ and yᵢ are binary interaction flags produced from the raw values using the chosen threshold.
6) Weighted Dice coefficient
sim = 2Σ(wᵢmin(xᵢ,yᵢ)) / (Σ(wᵢxᵢ) + Σ(wᵢyᵢ))
7) Weighted Tanimoto coefficient
sim = Σ(wᵢaᵢbᵢ) / ( Σ(wᵢaᵢ²) + Σ(wᵢbᵢ²) - Σ(wᵢaᵢbᵢ) )
Preprocessing note
The calculator first aligns the two vectors, applies the selected missing-value rule, optionally normalizes each vector, then computes every metric on the processed data. Jaccard and Dice use thresholded binary activity signals from the raw aligned values.
How to use this calculator
Step 1
Enter User A and User B values in the same order. Each position should refer to the same item, feature, genre, product, or behavior signal.
Step 2
Add optional labels and weights. Labels make the output readable. Weights increase the influence of high-priority or more reliable observations.
Step 3
Choose a headline similarity method, a preprocessing mode, and a missing-value rule. Adjust the binary threshold when using interaction-based metrics like Jaccard or Dice.
Step 4
Press Calculate Similarity. The result appears above the form, followed by a metric comparison table, a Plotly graph, and an item-level breakdown.
Step 5
Use the export buttons to download a CSV for auditing or a PDF snapshot for reporting, recommendation reviews, or model documentation.
FAQs
1) What does user similarity measure?
User similarity estimates how closely two users behave or rate the same items. Higher scores usually imply stronger preference alignment, which supports recommendation engines, clustering, and neighbor-based collaborative filtering.
2) Which metric should I choose?
Cosine works well for directional comparisons, Pearson focuses on rating patterns after centering, Euclidean and Manhattan reward smaller gaps, and Jaccard or Dice suit binary interaction data.
3) Why use normalization?
Normalization reduces scale bias. It helps when one user consistently scores higher than another, letting the comparison emphasize patterns instead of raw magnitudes.
4) What do weights do?
Weights let important items influence the score more strongly. Increase them for recent activity, premium products, reliable labels, or high-confidence observations.
5) How are missing values handled?
Pairwise ignore removes incomplete pairs before scoring. Zero fill treats missing values as zeros, which fits sparse implicit-feedback data but can reduce similarity.
6) Can similarity be negative?
Yes. Pearson can turn negative when users move in opposite directions around their own averages. A negative result suggests disagreement rather than closeness.
7) When are Jaccard and Dice best?
They are best for yes-or-no activity such as clicks, purchases, watched items, or likes. The threshold converts numeric inputs into active interaction flags.
8) What do the exports include?
CSV exports metrics and item-level details for auditing. PDF captures the visible results section, making sharing easier with teammates, analysts, or clients.