[Biojava-dev] SymbolList tokenization
Thomas Down
thomas at derkholm.net
Thu Aug 21 05:30:43 EDT 2003
Once upon a time, Francois Pepin wrote:
>
> I think it would've been easier to deal with it by letting Symbol take
> care of itself, rather than to have the machinery around it having to
> think about everything. After all, this is how things are being handled
> in the XML file, so I'm probably not the only person thinking in that
> way. Maybe a way would be to use the same machinery to define beefed-up
> Symbols that know about everything and then have the Alphabet created
> around a set of dumb Symbols.
No, the XML format follows the patterns in the object model. I think
you might have been looking at an old DTD that's floating around from
pre-1.2 versions when symbols did know their tokens.
> I'll go and modify the SeqString code as well. First it will try to use
> the "token" Tokenization and if there are no such Tokenization or if
> it's missing some Symbols, then it'll fall back to a name Tokenization.
That will certainly help. At the same time, it might be worth
adding some extra methods on Alphabet for discovering tokenizations.
At a minimum:
public Set<String> getTokenizationNames();
Since at the moment the only way to discover if a tokenization
exists is to ask for it by name and catch the exception if it's
not there. Ugh.
Thomas.
More information about the biojava-dev
mailing list