Advanced User Similarity Calculator

Calculator inputs

User A values

Enter ratings, embeddings, engagement scores, or feature values separated by commas or new lines.

User B values

Use matching positions for the same items or features. You can write NA for missing values.

Item labels

Optional labels improve result interpretation and chart readability.

Item weights

Optional positive weights. Use larger weights for trusted or more valuable signals.

Primary similarity method

The selected method becomes the headline score. The table still shows every metric.

Preprocessing

Normalization is applied independently to each vector before most metrics are computed.

Missing-value handling

Pairwise ignore removes incomplete pairs. Zero fill is common for sparse implicit feedback.

Binary threshold

Used by Jaccard and Dice. Values equal to or above the threshold become active interactions.

Decimal precision

Choose how many decimal places to display in tables, cards, and exports.

Example data table

Category	User A	User B	Weight
Action	4.5	4.0	1.0
Comedy	3.0	2.5	1.0
Drama	5.0	4.5	1.2
Sci-Fi	2.0	2.0	0.8
Documentary	4.0	4.5	1.1
Thriller	3.5	3.0	1.0

Formula used

1) Weighted cosine similarity

sim = Σ(wᵢaᵢbᵢ) / ( √Σ(wᵢaᵢ²) × √Σ(wᵢbᵢ²) )

2) Weighted Pearson correlation

sim = Σ(wᵢ(aᵢ-ā)(bᵢ-b̄)) / ( √Σ(wᵢ(aᵢ-ā)²) × √Σ(wᵢ(bᵢ-b̄)²) )

3) Euclidean-based similarity

sim = 1 / ( 1 + √Σ(wᵢ(aᵢ-bᵢ)²) )

4) Manhattan-based similarity

sim = 1 / ( 1 + Σ(wᵢ|aᵢ-bᵢ|) )

5) Weighted Jaccard similarity

sim = Σ(wᵢmin(xᵢ,yᵢ)) / Σ(wᵢmax(xᵢ,yᵢ))

Here xᵢ and yᵢ are binary interaction flags produced from the raw values using the chosen threshold.

6) Weighted Dice coefficient

sim = 2Σ(wᵢmin(xᵢ,yᵢ)) / (Σ(wᵢxᵢ) + Σ(wᵢyᵢ))

7) Weighted Tanimoto coefficient

sim = Σ(wᵢaᵢbᵢ) / ( Σ(wᵢaᵢ²) + Σ(wᵢbᵢ²) - Σ(wᵢaᵢbᵢ) )

Preprocessing note

The calculator first aligns the two vectors, applies the selected missing-value rule, optionally normalizes each vector, then computes every metric on the processed data. Jaccard and Dice use thresholded binary activity signals from the raw aligned values.

How to use this calculator

Step 1

Enter User A and User B values in the same order. Each position should refer to the same item, feature, genre, product, or behavior signal.

Step 2

Add optional labels and weights. Labels make the output readable. Weights increase the influence of high-priority or more reliable observations.

Step 3

Choose a headline similarity method, a preprocessing mode, and a missing-value rule. Adjust the binary threshold when using interaction-based metrics like Jaccard or Dice.

Step 4

Press Calculate Similarity. The result appears above the form, followed by a metric comparison table, a Plotly graph, and an item-level breakdown.

Step 5

Use the export buttons to download a CSV for auditing or a PDF snapshot for reporting, recommendation reviews, or model documentation.

FAQs

1) What does user similarity measure?

User similarity estimates how closely two users behave or rate the same items. Higher scores usually imply stronger preference alignment, which supports recommendation engines, clustering, and neighbor-based collaborative filtering.

2) Which metric should I choose?

Cosine works well for directional comparisons, Pearson focuses on rating patterns after centering, Euclidean and Manhattan reward smaller gaps, and Jaccard or Dice suit binary interaction data.

3) Why use normalization?

Normalization reduces scale bias. It helps when one user consistently scores higher than another, letting the comparison emphasize patterns instead of raw magnitudes.

4) What do weights do?

Weights let important items influence the score more strongly. Increase them for recent activity, premium products, reliable labels, or high-confidence observations.

5) How are missing values handled?

Pairwise ignore removes incomplete pairs before scoring. Zero fill treats missing values as zeros, which fits sparse implicit-feedback data but can reduce similarity.

6) Can similarity be negative?

Yes. Pearson can turn negative when users move in opposite directions around their own averages. A negative result suggests disagreement rather than closeness.

7) When are Jaccard and Dice best?

They are best for yes-or-no activity such as clicks, purchases, watched items, or likes. The threshold converts numeric inputs into active interaction flags.

8) What do the exports include?

CSV exports metrics and item-level details for auditing. PDF captures the visible results section, making sharing easier with teammates, analysts, or clients.