[Biojava-l] SymbolParsers in Alphabet

Wed, 11 Dec 2002 08:19:14 -0800

Thank you for the suggestion.  I made that work.  Here is another related question: how can  I make my own custom alphabet handle the ambiguity symbol X like what the predefined protein alphabet does?  Thanks.

Zhen

-----Original Message-----
From: Thomas Down [mailto:td2@sanger.ac.uk]
Sent: Monday, December 09, 2002 3:16 PM
To: Ren, Zhen
Cc: biojava-l@biojava.org
Subject: Re: [Biojava-l] SymbolParsers in Alphabet

On Mon, Dec 09, 2002 at 02:07:51PM -0800, Ren, Zhen wrote:
> Hi, there,
> 
> The interface Alphabet has a method
>     public SymbolTokenization getTokenization(java.lang.String name) throws BioException
> 
> Every alphabet should have a SymbolParser under the name 'token' that uses the symbol token characters to translate a string into a SymbolList. Likewise, there should be a SymbolParser under the name 'name' that uses symbol names to identify symbols.
> 
> I'd like to know how I can have these two SymbolParsers under the names 'token' and 'name' incorporated into an alphabet I intend to create by myself.

Hmmm, "SymbolParser" is an old interface name.  Its functionality
has been subsumed into SymbolTokenization, but a few references to
it survived in the javadoc for a while.  I think these are gone
now, if you get the latest CVS version.

Exactly how tokenizations are handled depends on the Alphabet
implementation.  If you write your own Alphabet, you can, of
coure, write a getTokenization method yourself.

If you're construction a SimpleAlphabet, getTokenization("name") is handled
automatically for you.  To add extra tokenizations, you can do
something like:

     CharacterTokenization toke = new CharacterTokenization(
         myAlphabet,
         false
     );
     toke.bindSymbol(symbol0, '0');
     toke.bindSymbol(symbol1, '1');
     myAlphabet.putTokenization("token", toke);

Hope this helps,

     Thomas.