[Biojava-dev] SymbolList tokenization

Thomas Down thomas at derkholm.net
Thu Aug 21 05:30:43 EDT 2003


Once upon a time, Francois Pepin wrote:
> 
> I think it would've been easier to deal with it by letting Symbol take
> care of itself, rather than to have the machinery around it having to
> think about everything. After all, this is how things are being handled
> in the XML file, so I'm probably not the only person thinking in that
> way. Maybe a way would be to use the same machinery to define beefed-up
> Symbols that know about everything and then have the Alphabet created
> around a set of dumb Symbols.

No, the XML format follows the patterns in the object model.  I think
you might have been looking at an old DTD that's floating around from
pre-1.2 versions when symbols did know their tokens.

> I'll go and modify the SeqString code as well. First it will try to use
> the "token" Tokenization and if there are no such Tokenization or if
> it's missing some Symbols, then it'll fall back to a name Tokenization.

That will certainly help.  At the same time, it might be worth
adding some extra methods on Alphabet for discovering tokenizations.
At a minimum:

     public Set<String> getTokenizationNames();

Since at the moment the only way to discover if a tokenization
exists is to ask for it by name and catch the exception if it's
not there.  Ugh.

    Thomas.


More information about the biojava-dev mailing list