Bioperl: Start of alignment debate...
Sat, 23 Jan 1999 10:59:33 +0000 (GMT)
David - thanks for your comments. I am currently updating the alignment
web page, and will be trying to integrate all the comments together.
Specific comments below...
On Fri, 22 Jan 1999, David J. States wrote:
> A couple of thoughts on the alignment discussion (and referring to Ewan's
> overview http://bio.perl.org/Projects/SeqAlign/overview.html),
> 1) the object should be in able to handle both pairwise alignments and
> multiple alignments seamlessly (I think this is part of the Ewans overview,
> but it is not stated explicitly).
> 2) many multiple alignments are associated with an evolutionary or other
> tree structure, and the scoring of the alignment cannot be understood
> without reference to this tree. It is therefore important that the
> alignment object be able to represent a hierarchical view of multiple
> sequence set. In some cases this may be rooted in others not. Hierarchies
> can be handled as alignment of alignments, but it is important to retain
> things like edge lengths.
I think this harder than you might at first think - this is one of
alignments of alignments cases. In editors this sort of functionality
is called 'grouping' and is important (and would nice to be a 'proper'
tree, not just groups). But worth trying to integrate in.
> 3) not all alignments are equally confident at all locations. The
> alignment object should be able to represent confidence measures both
> pairwise and multiple alignments.
There are a number of features which are specific to alignments, such as
confidence or match/insert switches.
It is not clear to me whether these sort of features should 'reuse' the
featueres on sequence stuff. Makes alignments oftena type of 'sequence'
and I don't like that.
> 4) the alignment object should be able to output a normalized
> representation of the alignment, even if the alignment itself is composed
> of one or more alignment objects in addition to sequence data, and
> independent of whatever internal data representation is used.
> 5) Ewan includes the goal "Alignments must be able to handle more than one
> residue aligned to (potentially more than one residue) in another
> sequence". If this means simply that a region in one sequence is aligned
> to a region in another sequence, OK. But if you mean a dotplot, that seems
> to me to be a different object altogether. One of the fundamental uses of
> alignment is to map from one sequence to another, and a dotplot does not
> allow you to do this without first extracting an alignment.
No - This is to handle DNA to protein or DNA to DNA alignments at the
level of protein, not dot-plots. You end up having to be able to match
one residue in one guy to three in another. It is a must (in my view- mind
you I spend my time with a variety of different DNA alignments).
> I guess that I have to come down against putting too much effort into
> editability for a couple of pragmatic reasons. The first is that
> implementing a static alignment object is going to be more than enough
> work. Second, there are some issues that arise if you edit an alignment
> that itself is part of another alignment. While you might devise rules
> that would allow you to propagate changes up-and-down hierarchy, the
> changes themselves might affect the way that the hierarchy was constructed.
> Thus editing one component of alignment might invalidate the alignment as
> a whole. Finally, I don't see that much application for editing as opposed
> constructing alignments. In almost all computational applications an edit
> is really regeneration of the alignment. I guess there are groups that
> maintain hand alignments used in phylogeny, but they have already developed
> multiple alignment editing tools so I don't think bioperl needs to support
> this application.
I think there is some truth to this - editors need internal
representations of the alignments to be able to manage edits - it is
really whether we believe that the bioperl object will be used as this
representation or not. I suspect it wont be, as editing datastructures hve
to be more bound up with how the editor is written.
> David J. States, M.D., Ph.D.
> Associate Professor and Director
> Institute for Biomedical Computing
> Washington University in St. Louis
> 700 S. Euclid Ave.
> St. Louis, MO 63110
> tel: 314 362 2134
> fax: 314 362 0234
> email: firstname.lastname@example.org
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc: