Advanced Pairwise Sequence Identity Calculator

Calculator Input

Molecule Type

Comparison Mode

Identity Denominator

Match Score

Mismatch Score

Gap Penalty

Sequence A

Whitespace is ignored. Unsupported symbols are removed during cleaning.

Sequence B

Use dashes only in pre-aligned mode when both cleaned lengths match.

Example Data Table

Example	Molecule	Sequence A	Sequence B	Mode	Denominator
1	DNA	ATGCTAGCTAAGT	ATGCGAGCTTAGT	Global alignment	Aligned length
2	RNA	AUGCUAAGCUU	AUG-UAAGCAU	Pre-aligned comparison	Non-gap positions
3	Protein	MSTNPKPQRITF	MSANPKP-RVTF	Pre-aligned comparison	Shorter cleaned length

Formula Used

Pairwise identity is reported with Identity % = (Matches / Denominator) × 100. Matches count positions where both aligned residues are identical and not gaps.

For global alignment mode, the page uses the Needleman-Wunsch recurrence: F(i,j) = max[F(i-1,j-1)+s, F(i-1,j)+g, F(i,j-1)+g] where s is the match or mismatch score and g is the gap penalty.

Denominator choices let you express identity relative to aligned length, non-gap positions, the shorter cleaned sequence, or the longer cleaned sequence.

How to Use This Calculator

Select DNA, RNA, or protein so the correct character set is applied.
Choose global alignment for raw sequences or pre-aligned mode for sequences already containing dashes.
Enter scoring values for matches, mismatches, and gaps when alignment mode is used.
Pick the identity denominator that matches your reporting standard.
Paste both sequences, submit the form, and review summary metrics above the input section.
Download CSV or PDF if you want a portable report for analysis records.

Frequently Asked Questions

1. What does pairwise sequence identity measure?

It measures the fraction of aligned positions containing identical residues. The percentage depends on the denominator you choose, so reporting the method is important.

2. When should I use global alignment mode?

Use global alignment when you have two raw sequences and want the page to place gaps automatically across their full lengths using your scoring settings.

3. When is pre-aligned mode better?

Use it when your sequences already contain dashes from another alignment workflow. Both cleaned strings must remain the same length after unsupported symbols are removed.

4. Why do denominator choices matter?

Aligned length includes gaps, while non-gap positions exclude them. Shorter and longer sequence denominators help match different lab, paper, or software reporting conventions.

5. Are ambiguous biological symbols accepted?

Yes. Common IUPAC ambiguity symbols are accepted for DNA and RNA. Protein mode also accepts standard uppercase residue letters and the asterisk symbol.

6. Why might some characters disappear after submission?

Unsupported characters, spaces, and line breaks are removed during cleaning. The result table shows how many symbols were discarded from each sequence.

7. Does a high alignment score always mean high identity?

Not always. Identity counts exact matches, while alignment score also reflects mismatch penalties and gap costs. Different scoring schemes can change the best alignment.

8. Can I use very long sequences here?

Moderate lengths work well. Extremely large global alignments can exceed browser or server limits, so pre-aligned mode or shorter regions may be more practical.