[EMBOSS] problems with allversusall tool

Daniel Barker db60 at st-andrews.ac.uk
Wed Jul 9 11:45:34 UTC 2008


Dear Laura,

Which EMBOSS program are you using? I don't find this effect with EMBOSS 
needle:

$ cat seq_a.fa
 >seq_a
MGQMQIV
$ cat seq_b.fa
 >seq_b
IV
$ needle
Needleman-Wunsch global alignment.
Input sequence: seq_a.fa
Second sequence(s): seq_b.fa
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output alignment [seq_a.needle]:
$ cat seq_a.needle
########################################
# Program: needle
# Rundate: Wed  9 Jul 2008 12:34:46
# Commandline: needle
#    -asequence seq_a.fa
#    -bsequence seq_b.fa
# Align_format: srspair
# Report_file: seq_a.needle
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: seq_a
# 2: seq_b
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 7
# Identity:       2/7 (28.6%)
# Similarity:     2/7 (28.6%)
# Gaps:           5/7 (71.4%)
# Score: 8.0
#
#
#=======================================

seq_a              1 MGQMQIV      7
                           ||
seq_b              1 -----IV      2


#---------------------------------------
#---------------------------------------

I'm not sure it's relevant to your question but note that, in EMBOSS 
needle, the score is unaffected by "hanging ends". I consider this odd, 
in fact not really a global alignment score. E.g. a protein with domain 
architecture -a-b-c-d- would get approx. the same score if aligned 
against a protein of domain architecture -c-d-, as it would when aligned 
against a protein of domain architecture -c-d-e-f-g-h-i-j-k-l-m-. In my 
view this goes against the spirit of global alignment - but this 
approach is briefly justified in the needle documentation, and I believe 
is not unusual for global"alignment programs. Here's what I mean:

$ cat seq_c.fa
 >seq_c
IVPPLKP
bhmac-db60-2:~ db60$ needle
Needleman-Wunsch global alignment.
Input sequence: seq_a.fa
Second sequence(s): seq_c.fa
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output alignment [seq_a.needle]:
$ cat seq_a.needle
########################################
# Program: needle
# Rundate: Wed  9 Jul 2008 12:37:01
# Commandline: needle
#    -asequence seq_a.fa
#    -bsequence seq_c.fa
# Align_format: srspair
# Report_file: seq_a.needle
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: seq_a
# 2: seq_c
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 12
# Identity:       2/12 (16.7%)
# Similarity:     2/12 (16.7%)
# Gaps:          10/12 (83.3%)
# Score: 8.0
#
#
#=======================================

seq_a              1 MGQMQIV-----      7
                           ||
seq_c              1 -----IVPPLKP      7


#---------------------------------------
#---------------------------------------

Note that identity, similarity and gaps have all changed but score 
remains the same as when seq_a and seq_b were aligned, since the only 
difference is a "hanging end".

Best regards,

Daniel

laura wrote:
> Dear emboss users, 
> 
> I am using allversus all tool for global sequence alignment. I am writing
> to you because I am obtaining perfect aligments between sequences that have
> a very different length.. for example if I have a 100 residues protein
> sequence and a 2 residues protein sequence I obtain a 100% identity when I
> perform the alignment, in which I would expect a very poor sequence
> identity. Is there any way to prevent it or it is a posible bug in the
> program?? 
> 
> I would thank you to answer me as soon as possible, 
> 
> Regards, 
> 
> Laura. 
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss

-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland :
No SC013532



More information about the EMBOSS mailing list