Semantic Distance Calculator

Calculator Inputs

Input Mode

Primary Metric

L2 Normalize Inputs

Vector A

Use comma or space separated numeric dimensions.

Vector B

Both vectors must have matching lengths.

Text Representation

Used only when comparing raw text.

Text A

Text B

Minkowski p

Common choices are 1, 2, or 3.

Jaccard Threshold

Values at or above this threshold count as present.

Decimal Places

Example Data Table

Scenario	Mode	Input A	Input B	Likely Reading
Query and relevant document	Vector	0.82, 0.12, 0.44, 0.71, 0.19, 0.63	0.79, 0.10, 0.47, 0.68, 0.22, 0.61	Very small cosine distance and strong alignment
Search intent comparison	Text	semantic retrieval improves ranking for support articles	dense search ranking improves article retrieval quality	Moderate to close semantic relationship
Unrelated phrases	Text	image segmentation for medical scans	quarterly revenue forecast for retail stores	Larger distance and weaker alignment

Formula Used

Cosine Similarity: cos(θ) = (A · B) / (||A|| × ||B||)

Cosine Distance: 1 − cosine similarity

Angular Distance: arccos(cosine similarity) / π

Euclidean Distance: √Σ(Ai − Bi)²

Manhattan Distance: Σ|Ai − Bi|

Minkowski Distance: (Σ|Ai − Bi|^p)^1/p

Pearson Correlation: covariance(A, B) / (σA × σB)

Jaccard Similarity: intersection of active dimensions / union of active dimensions

For raw text mode, the page first converts both texts into aligned token vectors. For production semantic analysis, embedding vectors usually provide better meaning coverage than bag-of-words counts.

How to Use This Calculator

Choose Embedding vectors when you already have model output dimensions.
Choose Raw text comparison when you want a quick token-based estimate.
Paste matching vectors or enter two text passages.
Select a primary metric for the highlighted result panel.
Turn on normalization when vector scale should not dominate distance.
Adjust Minkowski p and Jaccard threshold only when those metrics matter.
Press Calculate Semantic Distance to show results above the form.
Download the current summary as CSV or PDF when needed.

Frequently Asked Questions

1. What is semantic distance?

Semantic distance measures how far two meanings are from each other. Smaller values usually suggest closer intent, topic, or representation, especially when using embedding vectors from language models.

2. Which metric should I trust first?

Cosine distance is often the first choice for embeddings because it focuses on direction instead of raw magnitude. It is common in retrieval, clustering, and recommendation workflows.

3. Why would I normalize vectors?

Normalization removes scale differences between vectors. That helps when one embedding has larger raw magnitude but similar direction, allowing distance metrics to reflect meaning more cleanly.

4. Is text mode true embedding similarity?

No. Text mode uses token vectors built from the entered words. It is useful for quick checks, but real embedding vectors usually capture context and paraphrases more accurately.

5. What does a high cosine distance mean?

A higher cosine distance means the vectors point in more different directions. In many AI tasks, that suggests weaker semantic similarity or weaker topical alignment.

6. When is Euclidean distance helpful?

Euclidean distance is helpful when absolute dimensional gaps matter, not only direction. It is common in spatial analysis, anomaly detection, and feature-space comparison after scaling.

7. Why do Jaccard results look different?

Jaccard reduces the comparison to active versus inactive dimensions based on your threshold. It emphasizes overlap patterns rather than exact numeric closeness.

8. Can I use this for search tuning?

Yes. You can compare query vectors, document vectors, label embeddings, or taxonomy terms. It is useful for debugging ranking quality, threshold rules, and cluster boundaries.