[Biojava-l] TokenParser and CrossProduct

Thomas Down td2@sanger.ac.uk
Tue, 14 Nov 2000 12:35:52 +0000

On Mon, Nov 13, 2000 at 06:47:32PM -0800, Emig, Robin wrote:
> 	I just tried to use the TokenParser on a crossproduct alphabet and
> it didn't work because the tokenParser class constructor sets up a map
> between a single character and symbol. Can a registered cvs person fix this?

Just out of interest, are you actually explicitly constructing
a TokenParser, or using the form:

  Alphabet alpha = ...
  Parser alphaTokens = alpha.getParser("token");

The "token" parser of a given alphabet is only defined if
there exists a well-defined mapping between Symbols in that
alphabet and printable characters in the unicode set.  This
is true of the simple DNA, RNA, and Protein alphabets, and
I guess also for some other simple alphabets you might want
to work with (dice rolls, coin tosses, whatever).  Cross
Product symbols are harder -- I guess we could define a
standard single-char representation for some cases, like
DNA x DNA, but it might be hard to get this accepted as
a standard outside BioJava.  And things get /really/ complicated
once you get to alphabet like ((DNA x DNA x DNA) x Protein)
(which is an entirely reasonable use of cross-products -- you
might use that to represent an alignment of coding DNA against a
protein sequence).

On the other hand, CrossProductAlphabets do have a defined
"name" parser.  The symbols have names like (cytosine, adenine).
This is a pretty verbose format for storing large amounts of
alignment, but it is at least unambiguous.

You are of course welcome to define your own token-mapping
and parser implementation for your favourite cross-product
alphabets, but unless you're working with a very common case,
I'm not sure if this really belongs in the BioJava core.

What definitely does need doing is some more documentation
about the relationship between alphabets and parsers, and
the cases where token-mappings do and don't exist.  We may
also want to change the SymbolParser interface a little bit 
as we switch to the new event-based I/O framework.  I'm still
very open to ideas about how CrossProductSymbols and 
Alignments ought to be handled for I/O.  So we may be able
to get something like the behaviour you want in future.

Happy hacking,

``If I was going to carry a large axe on my back to a diplomatic
function I think I'd want it glittery too.''
           -- Terry Pratchett