Calculator
Formula Used
The Hamming distance counts the number of positions where two equal-length sequences differ. It is widely used for binary vectors, coded responses, categorical sequences, and symbol-by-symbol statistical comparison.
H(x, y) = Σ I(xᵢ ≠ yᵢ), for i = 1 to n
Normalized Distance = H / n
Similarity Index = 1 - (H / n)
Mismatch Rate (%) = (H / n) × 100
Here, I(xᵢ ≠ yᵢ) equals 1 when the two items at position i are different and 0 when they match.
How to Use This Calculator
- Paste or type Sequence A and Sequence B.
- Select character mode for direct symbol comparison or token mode for delimiter-separated values.
- Set the delimiter when using token mode.
- Choose whether case should matter.
- Enable whitespace normalization if needed.
- Click Calculate Distance.
- Review the result summary, mismatch table, and Plotly graph.
- Download your findings as CSV or PDF.
Example Data Table
| Example Type | Sequence A | Sequence B | Length | Hamming Distance | Normalized Distance |
|---|---|---|---|---|---|
| Binary | 10110011 | 10010001 | 8 | 2 | 0.2500 |
| DNA Symbols | ACGTACGA | ACCTTCGA | 8 | 2 | 0.2500 |
| Token Sequence | A,B,C,D,E | A,B,X,D,Y | 5 | 2 | 0.4000 |
Frequently Asked Questions
1) What is Hamming distance?
It counts the positions where two equal-length sequences differ. In statistics, it helps compare coded responses, binary vectors, categorical strings, and symbol-based observations.
2) Why must the two sequences have equal length?
Hamming distance compares values position by position. If one sequence has extra positions, the pairing becomes undefined, so the comparison would no longer be a true Hamming calculation.
3) What is the difference between raw distance and normalized distance?
Raw distance is the number of mismatched positions. Normalized distance divides that number by total length, making results easier to compare across sequences of different sizes.
4) Can I compare comma-separated categories instead of characters?
Yes. Switch to token mode, choose the delimiter, and the calculator will compare each token position rather than each character position.
5) What does the similarity index represent?
The similarity index shows the proportion of positions that match. A value near 1 means the sequences are very similar, while a value near 0 indicates strong dissimilarity.
6) How are spaces handled in the calculator?
When whitespace normalization is enabled, spaces are removed in character mode and trimmed around tokens in token mode. This helps avoid accidental mismatches caused by formatting.
7) Where is Hamming distance useful in statistics?
It is useful for comparing coded survey responses, error patterns, binary classifications, categorical sequences, clustering inputs, and symbolic records where order and position both matter.
8) Does this calculator work only for binary sequences?
No. It works for binary, text, categorical, nucleotide, and other symbolic sequences, as long as both processed inputs have the same number of positions.