Calculator Input
Example Data Table
| Example | Molecule | Sequence A | Sequence B | Mode | Denominator |
|---|---|---|---|---|---|
| 1 | DNA | ATGCTAGCTAAGT | ATGCGAGCTTAGT | Global alignment | Aligned length |
| 2 | RNA | AUGCUAAGCUU | AUG-UAAGCAU | Pre-aligned comparison | Non-gap positions |
| 3 | Protein | MSTNPKPQRITF | MSANPKP-RVTF | Pre-aligned comparison | Shorter cleaned length |
Formula Used
Pairwise identity is reported with Identity % = (Matches / Denominator) × 100.
Matches count positions where both aligned residues are identical and not gaps.
For global alignment mode, the page uses the Needleman-Wunsch recurrence:
F(i,j) = max[F(i-1,j-1)+s, F(i-1,j)+g, F(i,j-1)+g]
where s is the match or mismatch score and g is the gap penalty.
Denominator choices let you express identity relative to aligned length, non-gap positions, the shorter cleaned sequence, or the longer cleaned sequence.
How to Use This Calculator
- Select DNA, RNA, or protein so the correct character set is applied.
- Choose global alignment for raw sequences or pre-aligned mode for sequences already containing dashes.
- Enter scoring values for matches, mismatches, and gaps when alignment mode is used.
- Pick the identity denominator that matches your reporting standard.
- Paste both sequences, submit the form, and review summary metrics above the input section.
- Download CSV or PDF if you want a portable report for analysis records.
Frequently Asked Questions
1. What does pairwise sequence identity measure?
It measures the fraction of aligned positions containing identical residues. The percentage depends on the denominator you choose, so reporting the method is important.
2. When should I use global alignment mode?
Use global alignment when you have two raw sequences and want the page to place gaps automatically across their full lengths using your scoring settings.
3. When is pre-aligned mode better?
Use it when your sequences already contain dashes from another alignment workflow. Both cleaned strings must remain the same length after unsupported symbols are removed.
4. Why do denominator choices matter?
Aligned length includes gaps, while non-gap positions exclude them. Shorter and longer sequence denominators help match different lab, paper, or software reporting conventions.
5. Are ambiguous biological symbols accepted?
Yes. Common IUPAC ambiguity symbols are accepted for DNA and RNA. Protein mode also accepts standard uppercase residue letters and the asterisk symbol.
6. Why might some characters disappear after submission?
Unsupported characters, spaces, and line breaks are removed during cleaning. The result table shows how many symbols were discarded from each sequence.
7. Does a high alignment score always mean high identity?
Not always. Identity counts exact matches, while alignment score also reflects mismatch penalties and gap costs. Different scoring schemes can change the best alignment.
8. Can I use very long sequences here?
Moderate lengths work well. Extremely large global alignments can exceed browser or server limits, so pre-aligned mode or shorter regions may be more practical.