[Biojava-dev] gaps and basis symbols

Kalle Näslund kalle.naslund at genpat.uu.se
Wed Oct 26 05:05:15 EDT 2005


mark.schreiber at novartis.com wrote:

>Further to this ...
>
>Investigating a bit further it seems that AlphabetManager.xml denotes an 
><ambiguityMapping> for "-"  ,  "." and " ". It denotes a <gapMapping> for 
>"~".
>
>I'm not sure if this is an oversight or if this was intentional. Should 
>they not all be <gapMapping>s?? Are not all gaps created equal? If I edit 
>this in my copy of AlphabetManager.xml then everything seems to work and 
>the JUnit tests still pass. It seems odd though, given that this has not 
>been spotted before I am thinking it is intentional.
>
>Should I commit these changes to CVS???
>
>- Mark 
>
>  
>
Hi!

I realy dont have much of a clue myself, but i have been digging around 
in the
serialization code myself, and have come to a similar conclusion as you.

In reagards to the "~" i THINK the idea is that there are different gaps 
in biojava.
One gap, the "-" are for gaps inside a sequence, while "~" are for gaps 
that realy
do not exist in the sequence, they are there because there is no 
sequence, normaly
this would be in a multiple alignment, where any initial and terminal 
gaps are
"~" and any gaps inside the actual sequence are "-".

I think this is used somewhere aswell, perhaps in the HMM code ?

If we are nasty we could always give Matthew a "nice" welcome back 
present =P

Kalle

>
>
>
>Mark Schreiber/GP/Novartis at PH
>Sent by: biojava-dev-bounces at portal.open-bio.org
>10/21/2005 03:58 PM
>
> 
>        To:     biojava-dev at biojava.org
>        cc:     (bcc: Mark Schreiber/GP/Novartis)
>        Subject:        [Biojava-dev] gaps and basis symbols
>
>
>Hello -
>
>There seems to be a slightly strange relationship between gaps and 
>AlphabetManager.getGapSymbol(). If I take (for example) the 
>SymbolTokenization of DNA and ask it for the Symbol associated with "-" it 
>
>gives me back a BasisSymbol that is composed of a List that contains only 
>the GapSymbol from AlphabetManager.
>
>This leads to the slightly weird problem that the Symbol returned != 
>AlphabetManager.getGapSymbol() which is what I expected. This also causes 
>some curious problems with serialization that may or may not be related. 
>Regardless, why does the "-" token not map directly to the GapSymbol in a 
>singleton manner rather than mapping to the BasisSymbol composed of a List 
>
>of only the GapSymbol.
>
>Can any biojava mystics illucidate some wisdom on this?
>
>- Mark
>
>Mark Schreiber
>Research Investigator (Bioinformatics)
>
>Novartis Institute for Tropical Diseases (NITD)
>10 Biopolis Road
>#05-01 Chromos
>Singapore 138670
>www.nitd.novartis.com
>
>phone +65 6722 2973
>fax  +65 6722 2910
>_______________________________________________
>biojava-dev mailing list
>biojava-dev at biojava.org
>http://biojava.org/mailman/listinfo/biojava-dev
>
>
>
>_______________________________________________
>biojava-dev mailing list
>biojava-dev at biojava.org
>http://biojava.org/mailman/listinfo/biojava-dev
>  
>



More information about the biojava-dev mailing list