[Biojava-dev] SymbolList tokenization

Francois Pepin fpepin at cs.mcgill.ca
Wed Aug 20 11:19:39 EDT 2003


Hi everyone,

Would anyone have problems with redefining a bit how tokenizers work? The
current way is quite complicated if someone wants to work with a custom
alphabet. Trying to tokenize an DNAxDNAxDNA SymbolList also fails because
no tokenizer is defined for that alphabet.

For the "token" tokenization, I think it would be more sensible to have
the default ask the Symbol to see what their character token is. After
all, if the Symbols are responsible for knowing their own name, they
should also be responsible to know their own 1-letter code.

The mehods ar there to create Symbols with a character token, but they're
deprecated. I think that those methods should still be used. And then we
could have a default "token" tokenization that just asks the symbols what
is their preferred token.

In the case where a specific tokenization is needed, then just overwrite
the default one.

Francois



More information about the biojava-dev mailing list