[Bioperl-l] Re: Frameshifts in alignments ... ?
   
    Aaron J Mackey
     
    Aaron J. Mackey" <amackey@virginia.edu
       
    Tue, 3 Sep 2002 10:10:11 -0400 (EDT)
    
    
  
On Tue, 3 Sep 2002, Ewan Birney wrote:
> BTW - we should call them  Bio::Seq::EncodedSequence
Great ... (just out of curiousity, why not Bio::Seq::EncodedSeq ... or
just Bio::Seq::Encoded; is there a reason for the redundancy?)
> Remember that the "encoding" is as well as the bases, ie, one effectively
> has two "tracks", being
>
>    CCCCCCCCCCCIIIIIIIIIIIIIIIIIIIIIIICCCCCGGGCCCC
>    ATGGGTGTATGTATTGTGTAAAAAGAATGTTAAGGTTGT---GTET
> I am happy to get into this. I would propose the following encodings:
> I could adapt genewise to directly output this stuff.
I guess I'm not sure why we need an *internal* encoding like this; I would
argue that the various methods I proposed would be easier via the
SeqFeature annotation representation (since relative to the length of the
sequence, the number of gap/intron/frameshift locations should be small).
Or do you just mean that this encoding should be available for dumping via
$obj->encoding() (and perhaps acceptable to a new() constructor)?
$obj = new Bio::Seq::EncodedSequence (-encoding => "CCCCCCCCCCCIIIIIIIIIIIIIIIIIIIIIIICCCCCGGGCCCC",
                                      -sequence => "ATGGGTGTATGTATTGTGTAAAAAGAATGTTAAGGTTGTGTET",
                                      -start => 100, -end => 128, -strand => 1
                                     );
There was also my "embedded" encoding (which is what we tend to see in
alignment outputs), with frameshift (/, \), intron boundaries ([...]) and
gap characters, that I proposed could be obtained via as_string():
ATGGGT/GTATG[TATTGTGTAAAAAG]AATGT\TAAGGTTGT---GTET
I guess now I'm inching towards an Bio::SeqIO::encoded::wise,
Bio::SeqIO::encoded::tfastx, ... ?
> Are you keen to code this up Aaron... or hoping I would ?
I'm good to go, given that I understand the desired direction ... and I
do agree TIMTOWTDI and all.
-Aaron
-- 
 Aaron J Mackey
 Pearson Laboratory
 University of Virginia
 (434) 924-2821
 amackey@virginia.edu