[EMBOSS] Needle/water, revcomp

Broger, Clemens clemens.broger at roche.com
Thu Jun 23 13:48:24 UTC 2005


I have 2 questions:

The first is about identity/similarity in nucleotide alignments made
with needle (probably the same holds true for water):
 
########################################
# Program:  needle
# Rundate:  Thu Jun 23 13:29:58 2005
# Align_format: srspair
# Report_file: seq0.needle
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: SEQ0
# 2: SEQ1
# Matrix: EDNAFULL
# Gap_penalty: 100.0
# Extend_penalty: 10.0
#
# Length: 70
# Length of sequence 1: 70
# Length of sequence 2: 70
# Identity:      46/70 (65.7%)
# Similarity:    47/70 (67.1%)
# Gaps:           0/70 ( 0.0%)
# Score: 162.0
# 
#
#=======================================

                              .         .         .         .         .
SEQ0               1 aaaaaaaaaaaaaaaaaaaaaaaaacccccgggggtttttuuuuunnnnn
50
                     |||||||||||||||||||||......|......||:....:|..     
SEQ1               1 aaaaaaaaaaaaaaaaaaaaacgtunacgtunacgtunacgtunacgtun
50
                              .         .         .         .         .

                              .         .
SEQ0              51 aaaaaaaaaaaaaaaaaaaa     70
                     ||||||||||||||||||||
SEQ1              51 aaaaaaaaaaaaaaaaaaaa     70
                              .         .

Each base of the set acgtun is aligned against each other. The 20 a's at
the beginning and end are only to force an ungapped alignment. Maximum
gap penalties were used.
 
I agree with the symbols in the alignment |,: and ., but the 46
identities in the summary imply that the n-n match is also counted. The
t-u matches are counted as similar, which is ok, but the n-n match is
not counted as similar, although it is counted as identical. I think the
n-n match should not be counted both in identity and similarity.
 
Now for ambiguous bases. w is a or t
 
########################################
# Program:  needle
# Rundate:  Thu Jun 23 14:53:33 2005
# Align_format: srspair
# Report_file: seq0.needle
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: SEQ0
# 2: SEQ1
# Matrix: EDNAFULL
# Gap_penalty: 100.0
# Extend_penalty: 10.0
#
# Length: 26
# Length of sequence 1: 26
# Length of sequence 2: 26
# Identity:      21/26 (80.8%)
# Similarity:    23/26 (88.5%)
# Gaps:           0/26 ( 0.0%)
# Score: 94.0
# 
#
#=======================================

                              .         .      
SEQ0               1 aaaaaaaaaawwwwwwaaaaaaaaaa     26
                     ||||||||||..   .||||||||||
SEQ1               1 aaaaaaaaaaatwgcuaaaaaaaaaa     26
                              .         .      

In the alignment I would put a dot at the w-w match (but I could also
agree with the way it is handled now). But again the w is counted in the
summary as an identity but not as a similarity.



The second question is about the handling in EMBOSS of
reverse-complemented nucleotide segments such as  

db:seq[10:20:r]

The sequence is first reverse-complemented and then residues 10 to 20
are cut out.
Biologists usually expect that residues 10 to 20 are first cut out and
then reverse-complemented.

Can this be changed? That would be very helpful.

Best regards

Clemens


Dr. Clemens Broger
Bioinformatics
F. Hoffmann-La Roche Ltd.
PRBI 65/303
CH-4070 Basel
clemens.broger at roche.com
+41-61-688-4447

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/emboss/attachments/20050623/270ec53f/attachment-0001.html>


More information about the EMBOSS mailing list