[Bioperl-l] Re: Frameshifts in alignments ... ?

Matthew Pocock matthew_pocock@yahoo.co.uk
Tue, 03 Sep 2002 15:07:32 +0100


Ewan Birney wrote:
> [snip]
> Remember that the "encoding" is as well as the bases, ie, one effectively
> has two "tracks", being
> 
>    CCCCCCCCCCCIIIIIIIIIIIIIIIIIIIIIIICCCCCGGGCCCC
>    ATGGGTGTATGTATTGTGTAAAAAGAATGTTAAGGTTGT---GTET

Hi.

This is very similar to what the DP package in BioJava spits out. In our 
model, each state in an HMM is also a Symbol in an Alphabet instance 
(the alphabet of states for that model). When things are aligned to an 
HMM, the result is an alignment object with one row for each input 
sequence and one row for the state sequence. Since states extend symbol, 
they get treated fairly transparently by the APIs. Also, we can use the 
alphabet over doubles as another row - the per-column scores can be 
added as just another row of info in the alighment. So, IMHO, treating 
the state sequence as just another track of symbol information (possibly 
with some magical row identifier) is a good thing.

Matthew

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com