[Bioperl-l] Re: Framshifts in alignments ... ?

Ewan Birney birney@ebi.ac.uk
Thu, 29 Aug 2002 13:40:42 -0400 (EDT)


On Thu, 29 Aug 2002, Aaron J Mackey wrote:

>
> On Thu, 29 Aug 2002, Ewan Birney wrote:
>
> >   (b) there is no bioperl alignment model which sensibly handles
> > frameshifts
>
> What is the consensus on this issue?  We've just made the FASTA programs
> output using -m 9c horribly trivial to parse the alignments (we dump a
> coded string of gaps and frameshifts, after which you can easily grab the
> sequences from the alignment display, strip any gap/frameshift characters
> in it, and use the coded string to reconstruct from scratch; much much
> easier than parsing out the alignment).  We're working on making
> Search::IO::fasta -m 9 "capable", and it'd be great if we could start
> throwing alignment events as well ...

I guess from my perspective, if we do frameshifts, we should do introns as
well (genewise stuff) and ... that leads into the more general view of
alignments as a series of coordinates *and* states for each column. This
is the basic alignment model of Wise2 (both genewise and estwise and much
more besides...)

FYI  - The Wise2 alignment model is quite general:

An Alignment has:

   An ordered list of Columns which have

       An ordered list of Units (order here indicates the sequence or
"row")

       Each unit has
            A start
            A end
            A label (text)

     (a word of warning - I use C coordinates - starting 0 - in a
in-between-bases convention which sadly varies on the precise alignment
method being used. Basically teh cooridinate handling down here is a
little nasty)


In Wise2 there are a series of standard labels being things like

  "CODON", "INTRON", "SEQUENCE", "CODON_INSERT_1", "CODON_INSERT_2"

which probably should be an ontology (in fact inside Wise2 I do alot of
strstr matching so that INTRON and CENTRAL_INTRON both match the word
INTRON allowing more flexibility)



I deliberately did this column-wise not row-wise as it allows for
manipulations of the alignment data structure/nesting easily (though I
don't do this).



For Wise officandos, this is -alb format



Wise2 also has a more simplealign like alignment as a set of strings with
'-' in.


The problem I hit is that what people want to do with thes alignments are
wildly different - some people want to see the whole structure laid bare -
others just want the implict protein alignment, in other cases people want
the codon-by-codon alignment. What I have discovered is that there seems
to be no ideal mixtures of ease of use *and* complete representation, and
I have opted for the complete representation mode inside Wise2


now... what bioperl does... is another question I guess!


What would you propose Aaron?





>
> -Aaron
>
> --
>  Aaron J Mackey
>  Pearson Laboratory
>  University of Virginia
>  (434) 924-2821
>  amackey@virginia.edu
>
>
>