[Bioperl-l] Re: Frameshifts in alignments ... ?
Ewan Birney
birney@ebi.ac.uk
Tue, 3 Sep 2002 15:50:16 +0100 (BST)
On Tue, 3 Sep 2002, Aaron J Mackey wrote:
>
> On Tue, 3 Sep 2002, Ewan Birney wrote:
>
> > BTW - we should call them Bio::Seq::EncodedSequence
>
> Great ... (just out of curiousity, why not Bio::Seq::EncodedSeq ... or
> just Bio::Seq::Encoded; is there a reason for the redundancy?)
>
> > Remember that the "encoding" is as well as the bases, ie, one effectively
> > has two "tracks", being
> >
> > CCCCCCCCCCCIIIIIIIIIIIIIIIIIIIIIIICCCCCGGGCCCC
> > ATGGGTGTATGTATTGTGTAAAAAGAATGTTAAGGTTGT---GTET
>
> > I am happy to get into this. I would propose the following encodings:
>
> > I could adapt genewise to directly output this stuff.
>
> I guess I'm not sure why we need an *internal* encoding like this; I would
> argue that the various methods I proposed would be easier via the
> SeqFeature annotation representation (since relative to the length of the
> sequence, the number of gap/intron/frameshift locations should be small).
> Or do you just mean that this encoding should be available for dumping via
> $obj->encoding() (and perhaps acceptable to a new() constructor)?
I think the internals are (rightly) up to the implementor, and I was more
thinking about the interface being things like:
$seq->encoding_string()
or something.
>
> $obj = new Bio::Seq::EncodedSequence (-encoding => "CCCCCCCCCCCIIIIIIIIIIIIIIIIIIIIIIICCCCCGGGCCCC",
> -sequence => "ATGGGTGTATGTATTGTGTAAAAAGAATGTTAAGGTTGTGTET",
> -start => 100, -end => 128, -strand => 1
> );
>
> There was also my "embedded" encoding (which is what we tend to see in
> alignment outputs), with frameshift (/, \), intron boundaries ([...]) and
> gap characters, that I proposed could be obtained via as_string():
>
> ATGGGT/GTATG[TATTGTGTAAAAAG]AATGT\TAAGGTTGT---GTET
>
I think this is slightly insane (myself) as your coordinate system now has
to keep track of lots of hting - of course, it has to keep track of gaps
anyway. HMMM,.,,
> I guess now I'm inching towards an Bio::SeqIO::encoded::wise,
> Bio::SeqIO::encoded::tfastx, ... ?
>
Nah. Lets stick to one implementation at the moment, but with a
Bio::Seq::EncodedSeqI
we can slot in novel implementations if we like.
I would claim EncodedSeqI should have
$seq->encoding_string();
and
$encoded_lable = $seq->encoding_label($position);
methods on it. The constructor should have a well documented way to
intiated the encoding, with could be either the string, or a set of
features or both (your choice)
> > Are you keen to code this up Aaron... or hoping I would ?
>
> I'm good to go, given that I understand the desired direction ... and I
> do agree TIMTOWTDI and all.
>
Let's do it one way first the end we can do it multiple ways later ;)
> -Aaron
>
> --
> Aaron J Mackey
> Pearson Laboratory
> University of Virginia
> (434) 924-2821
> amackey@virginia.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------