needle: Strictly global alignment?

Jan T. Kim kim at inb.uni-luebeck.de
Fri Jun 13 16:59:44 UTC 2003


Hi Peter,

thanks for your quick reply.

On Fri, Jun 13, 2003 at 04:58:52PM +0100, Peter Rice wrote:
> Jan T. Kim wrote:
> > Dear EMBOSSers,
> > 
> > it seems to me that the alignment computed by needle is not a global
> > alignment in the strict sense of the term, because the terminal gaps
> > are not penalized.
> >
> > In the literature, a global alignment in which terminal gaps are not
> > scored is called a semiglobal alignment.
> > 
> > If my analysis is correct, I suggest to change the documentation
> > accordingly. In addition, it would be nice to extend the needle program
> > such that a global alignment sensu stricto can be carried out.
> 
> Sorry, but the literature from computer science and maths does not match 
> reality for biological sequences.
> 
> Remember that you have no information that either sequence is complete.

This may often be the case, but certainly not always. Protein sequences
are normally complete. More generally, it is quite well possible to
derive positional information delineating regions of homology from
biology. Initiation and termination positions of transcription and
translation, intron / exon borders, protein domains and RNA segments
characterized by specific secondary structures are notable examples.

Of course, if the sequence ends are not due to such biological
information but due to cloning or other technical (and hence not
biological) processes, gaps should clearly not be penalized.

> In bioinformatics, a global alignment is one which spans the full 
> extents of the input sequences (e.g. Baxevanis & Ouellette (2001) 
> "Bioinformatics" 2nd edition p189.

Ok, that's what I implicitly called "global alignment sensu lato".
Perhaps unfortunately, some bioinformatics books (e.g. Setubal & Meidanis,
1997: "Introduction to Computational Molecular Biology", page 49ff)
use the term "global alignment" to denote an alignment in which terminal
gaps are scored, and refer to a global alignment with "free ends" as a
"semiglobal alignment". When I suggested modifying the needle program
documentation, I intended to make it easier for those who have read
this definition to correctly understand what needle does. As it stands
now, such people may end up looking for an EMBOSS program for computing
a semiglobal alignment and not realizing that needle does this. Adding
a notice, e.g.

    needle does not penalize terminal gaps. This type of alignment is
    (also) called a "semiglobal alignment" in the literature.

would prevent such futile searches.

> There is no biological reason to 
> penalise terminal gaps.
>
> And of course, for many biological purposes, a local alignment is what 
> makes sense.

> ... Though we could add terminal gaps as an option for those rare cases 
> where it could be useful. I note that stretcher also does not score 
> terminal gaps.

Ok, I'd put that on my wish list, and I also put it on my list of
programming projects. While the modifications to implement this should
be marginal, I will have to familiarize myself somewhat more with the
EMBOSS code base and standards so I can do the modifications properly...

Kind regards,
Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: kim at inb.uni-luebeck.de                          |
 |    *NEW*    WWW:   http://www.inb.uni-luebeck.de/staff/kim.html    |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*



More information about the EMBOSS mailing list