Bioperl: Start of alignment debate...

Ewan Birney birney@sanger.ac.uk
Sat, 23 Jan 1999 10:59:33 +0000 (GMT)



David - thanks for your comments. I am currently updating the alignment
web page, and will be trying to integrate all the comments together.

Specific comments below...

On Fri, 22 Jan 1999, David J. States wrote:

> A couple of thoughts on the alignment discussion (and referring to Ewan's 
> overview http://bio.perl.org/Projects/SeqAlign/overview.html),
> 
> 1) the object should be in able to handle both pairwise alignments and 
> multiple alignments seamlessly (I think this is part of the Ewans overview, 
> but it is not stated explicitly).

yup.

> 
> 2) many multiple alignments are associated with an evolutionary or other 
> tree structure, and the scoring of the alignment cannot be understood 
> without reference to this tree.  It is therefore important that the 
> alignment object be able to represent a hierarchical view of multiple 
> sequence set.  In some cases this may be rooted in others not.  Hierarchies 
> can be handled as alignment of alignments, but it is important to retain 
> things like edge lengths.

I think this harder than you might at first think - this is one of 
alignments of alignments cases. In editors this sort of functionality
is called 'grouping' and is important (and would nice to be a 'proper'
tree, not just groups). But worth trying to integrate in.

> 
> 3) not all alignments are equally confident at all locations.  The 
> alignment object should be able to represent confidence measures both 
> pairwise and multiple alignments.

There are a number of features which are specific to alignments, such as
confidence or match/insert switches.

It is not clear to me whether these sort of features should 'reuse' the
featueres on sequence stuff. Makes alignments oftena type of 'sequence'
and I don't like that.

> 
> 4) the alignment object should be able to output a normalized 
> representation of the alignment, even if the alignment itself is composed 
> of one or more alignment objects in addition to sequence data, and 
> independent of whatever internal data representation is used.

Definitely.

> 
> 5) Ewan includes the goal "Alignments must be able to handle more than one 
> residue aligned to (potentially more than one residue) in another 
> sequence".  If this means simply that a region in one sequence is aligned 
> to a region in another sequence, OK.  But if you mean a dotplot, that seems 
> to me to be a different object altogether.  One of the fundamental uses of 
> alignment is to map from one sequence to another, and a dotplot does not 
> allow you to do this without first extracting an alignment.

No - This is to handle DNA to protein or DNA to DNA alignments at the
level of protein, not dot-plots. You end up having to be able to match
one residue in one guy to three in another. It is a must (in my view- mind
you I spend my time with a variety of different DNA alignments).

> 
> I guess that I have to come down against putting too much effort into 
> editability for a couple of pragmatic reasons.  The first is that 
> implementing a static alignment object is going to be more than enough 
> work.  Second, there are some issues that arise if you edit an alignment 
> that itself is part of another alignment.  While you might devise rules 
> that would allow you to propagate changes up-and-down hierarchy, the 
> changes themselves might affect the way that the hierarchy was constructed. 
>  Thus editing one component of alignment might invalidate the alignment as 
> a whole.  Finally, I don't see that much application for editing as opposed 
> constructing alignments.  In almost all computational applications an edit 
> is really regeneration of the alignment.  I guess there are groups that 
> maintain hand alignments used in phylogeny, but they have already developed 
> multiple alignment editing tools so I don't think bioperl needs to support 
> this application.

I think there is some truth to this - editors need internal
representations of the alignments to be able to manage edits - it is
really whether we believe that the bioperl object will be used as this
representation or not. I suspect it wont be, as editing datastructures hve
to be more bound up with how the editor is written.
> 
> David
> 
> ----
> David J. States, M.D., Ph.D.
> Associate Professor and Director
> Institute for Biomedical Computing
> Washington University in St. Louis
> 700 S. Euclid Ave.
> St. Louis, MO   63110
> 
> tel: 314 362 2134
> fax: 314 362 0234
> email: states@ibc.wustl.edu
> 
> 
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
> 

Ewan Birney
<birney@sanger.ac.uk>
http://www.sanger.ac.uk/Users/birney/

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================