Advanced User Similarity Calculator

Measure overlap, distance, and alignment across user vectors. Compare multiple metrics with weighted preprocessing controls. See clearer matches for smarter recommendations and collaborative filtering.

Calculator inputs

Enter ratings, embeddings, engagement scores, or feature values separated by commas or new lines.
Use matching positions for the same items or features. You can write NA for missing values.
Optional labels improve result interpretation and chart readability.
Optional positive weights. Use larger weights for trusted or more valuable signals.
The selected method becomes the headline score. The table still shows every metric.
Normalization is applied independently to each vector before most metrics are computed.
Pairwise ignore removes incomplete pairs. Zero fill is common for sparse implicit feedback.
Used by Jaccard and Dice. Values equal to or above the threshold become active interactions.
Choose how many decimal places to display in tables, cards, and exports.

Example data table

Category User A User B Weight
Action 4.5 4.0 1.0
Comedy 3.0 2.5 1.0
Drama 5.0 4.5 1.2
Sci-Fi 2.0 2.0 0.8
Documentary 4.0 4.5 1.1
Thriller 3.5 3.0 1.0

Formula used

1) Weighted cosine similarity

sim = Σ(wᵢaᵢbᵢ) / ( √Σ(wᵢaᵢ²) × √Σ(wᵢbᵢ²) )

2) Weighted Pearson correlation

sim = Σ(wᵢ(aᵢ-ā)(bᵢ-b̄)) / ( √Σ(wᵢ(aᵢ-ā)²) × √Σ(wᵢ(bᵢ-b̄)²) )

3) Euclidean-based similarity

sim = 1 / ( 1 + √Σ(wᵢ(aᵢ-bᵢ)²) )

4) Manhattan-based similarity

sim = 1 / ( 1 + Σ(wᵢ|aᵢ-bᵢ|) )

5) Weighted Jaccard similarity

sim = Σ(wᵢmin(xᵢ,yᵢ)) / Σ(wᵢmax(xᵢ,yᵢ))

Here xᵢ and yᵢ are binary interaction flags produced from the raw values using the chosen threshold.

6) Weighted Dice coefficient

sim = 2Σ(wᵢmin(xᵢ,yᵢ)) / (Σ(wᵢxᵢ) + Σ(wᵢyᵢ))

7) Weighted Tanimoto coefficient

sim = Σ(wᵢaᵢbᵢ) / ( Σ(wᵢaᵢ²) + Σ(wᵢbᵢ²) - Σ(wᵢaᵢbᵢ) )

Preprocessing note

The calculator first aligns the two vectors, applies the selected missing-value rule, optionally normalizes each vector, then computes every metric on the processed data. Jaccard and Dice use thresholded binary activity signals from the raw aligned values.

How to use this calculator

Step 1

Enter User A and User B values in the same order. Each position should refer to the same item, feature, genre, product, or behavior signal.

Step 2

Add optional labels and weights. Labels make the output readable. Weights increase the influence of high-priority or more reliable observations.

Step 3

Choose a headline similarity method, a preprocessing mode, and a missing-value rule. Adjust the binary threshold when using interaction-based metrics like Jaccard or Dice.

Step 4

Press Calculate Similarity. The result appears above the form, followed by a metric comparison table, a Plotly graph, and an item-level breakdown.

Step 5

Use the export buttons to download a CSV for auditing or a PDF snapshot for reporting, recommendation reviews, or model documentation.

FAQs

1) What does user similarity measure?

User similarity estimates how closely two users behave or rate the same items. Higher scores usually imply stronger preference alignment, which supports recommendation engines, clustering, and neighbor-based collaborative filtering.

2) Which metric should I choose?

Cosine works well for directional comparisons, Pearson focuses on rating patterns after centering, Euclidean and Manhattan reward smaller gaps, and Jaccard or Dice suit binary interaction data.

3) Why use normalization?

Normalization reduces scale bias. It helps when one user consistently scores higher than another, letting the comparison emphasize patterns instead of raw magnitudes.

4) What do weights do?

Weights let important items influence the score more strongly. Increase them for recent activity, premium products, reliable labels, or high-confidence observations.

5) How are missing values handled?

Pairwise ignore removes incomplete pairs before scoring. Zero fill treats missing values as zeros, which fits sparse implicit-feedback data but can reduce similarity.

6) Can similarity be negative?

Yes. Pearson can turn negative when users move in opposite directions around their own averages. A negative result suggests disagreement rather than closeness.

7) When are Jaccard and Dice best?

They are best for yes-or-no activity such as clicks, purchases, watched items, or likes. The threshold converts numeric inputs into active interaction flags.

8) What do the exports include?

CSV exports metrics and item-level details for auditing. PDF captures the visible results section, making sharing easier with teammates, analysts, or clients.

Related Calculators

cosine similaritycontextual banditpairwise rankingndcg scorenovelty scoreals factorizationchurn reductionbandit regretserendipity scoreexploration rate

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.