Distribution Similarity Test Calculator

Analyze sample similarity through distances and divergence measures. View histograms, cumulative curves, and overlap statistics. Make stronger distribution comparisons with transparent math and visuals.

Enter datasets and test settings

Use commas, spaces, new lines, or semicolons between numbers.
Enter 0 to use automatic bin selection.
Leave blank to use the combined minimum.
Leave blank to use the combined maximum.

Example data table

This example shows two short samples with similar centers but small shape differences. Use it to test the tool quickly.

Observation Dataset A Dataset B
11211
21312
31313
41414
51514
61515
71616
81716
91817
101818
111919
122021

Formula used

1. Histogram probabilities: For each bin i, probability is pᵢ = cᵢ / n, where cᵢ is the bin count and n is sample size.

2. Overlap coefficient: OVL = Σ min(pᵢ, qᵢ). Higher values mean more shared probability mass.

3. Total variation distance: TVD = 0.5 × Σ |pᵢ − qᵢ|. Lower values show smaller differences.

4. Hellinger distance: H = (1/√2) × √[Σ(√pᵢ − √qᵢ)²]. Lower values indicate closer distributions.

5. Bhattacharyya coefficient: BC = Σ √(pᵢqᵢ). Higher values indicate stronger similarity.

6. Jensen-Shannon divergence: JSD = 0.5 KL(P‖M) + 0.5 KL(Q‖M), where M = (P + Q) / 2. Lower divergence is better.

7. Kolmogorov-Smirnov distance: D = max |F₁(x) − F₂(x)|. It tracks the largest gap between empirical cumulative distributions.

8. KS critical value: Dα = √[-0.5 ln(α / 2)] × √[(n₁ + n₂) / (n₁n₂)]. When D > Dα, the equal-distribution assumption is rejected at alpha.

9. Overall similarity score: This page averages six bounded similarity components: overlap, Bhattacharyya, 1 − H, 1 − TVD, Jensen-Shannon similarity, and 1 − D.

How to use this calculator

  1. Paste numeric values for Dataset A and Dataset B into the two text boxes.
  2. Use commas, spaces, semicolons, or new lines to separate numbers.
  3. Choose a bin count, or enter zero for automatic selection.
  4. Set the alpha level for the KS decision rule.
  5. Set a similarity threshold to control the final similarity label.
  6. Optionally define a custom common range for histogram construction.
  7. Click Run Similarity Test to display results below the header.
  8. Review the summary cards, the Plotly graph, and the bin table.
  9. Export the result as CSV or PDF when needed.

Frequently asked questions

1) What does this calculator actually compare?

It compares two numeric datasets as full distributions, not just by averages. It measures overlap, divergence, cumulative separation, and shape similarity using several complementary metrics.

2) Why use several metrics instead of one?

No single metric captures every aspect of similarity. Some focus on shared mass, some on cumulative gaps, and others on divergence. Using several gives a more balanced reading.

3) What does a high overlap coefficient mean?

A high overlap coefficient means the two histogram probability patterns share a large amount of mass across bins. It suggests stronger distribution resemblance.

4) What does the KS distance tell me?

The KS distance is the largest vertical gap between empirical cumulative distributions. Smaller values mean the cumulative behavior of both datasets is more alike.

5) Does the number of bins affect results?

Yes. Histogram-based metrics depend on binning. Very few bins can hide differences, while too many can exaggerate noise. Automatic or moderate bin counts usually work well.

6) Should I trust the overall similarity score alone?

Use the overall score as a summary, not a replacement for inspection. Always read it together with the KS result, divergence values, and the comparison graph.

7) Can this compare samples of different sizes?

Yes. The calculator normalizes histogram counts to probabilities, so different sample sizes can still be compared meaningfully, provided both samples are representative.

8) When should I use a custom range?

Use a custom range when you want consistent external boundaries, such as comparing repeated experiments against a fixed scale. Otherwise, leave the range blank for automatic limits.

Related Calculators

spearman rank correlation calculatoriqr calculatorkernel density estimatorfisher exact test calculatorgoodman kruskal gammacramer v calculatortheil sen estimatorkruskal wallis test calculatoranderson darling test calculatorpaired sign test

Important Note: All the Calculators listed in this site are for educational purpose only and we do not guarentee the accuracy of results. Please do consult with other sources as well.