[Biojava-l] SymbolParsers in Alphabet

Mon, 9 Dec 2002 23:16:06 +0000

On Mon, Dec 09, 2002 at 02:07:51PM -0800, Ren, Zhen wrote:
> Hi, there,
> 
> The interface Alphabet has a method
>     public SymbolTokenization getTokenization(java.lang.String name) throws BioException
> 
> Every alphabet should have a SymbolParser under the name 'token' that uses the symbol token characters to translate a string into a SymbolList. Likewise, there should be a SymbolParser under the name 'name' that uses symbol names to identify symbols.
> 
> I'd like to know how I can have these two SymbolParsers under the names 'token' and 'name' incorporated into an alphabet I intend to create by myself.

Hmmm, "SymbolParser" is an old interface name.  Its functionality
has been subsumed into SymbolTokenization, but a few references to
it survived in the javadoc for a while.  I think these are gone
now, if you get the latest CVS version.

Exactly how tokenizations are handled depends on the Alphabet
implementation.  If you write your own Alphabet, you can, of
coure, write a getTokenization method yourself.

If you're construction a SimpleAlphabet, getTokenization("name") is handled
automatically for you.  To add extra tokenizations, you can do
something like:

     CharacterTokenization toke = new CharacterTokenization(
         myAlphabet,
         false
     );
     toke.bindSymbol(symbol0, '0');
     toke.bindSymbol(symbol1, '1');
     myAlphabet.putTokenization("token", toke);

Hope this helps,

     Thomas.