[Biojava-l] SymbolTokenization landing

Thomas Down td2@sanger.ac.uk
Fri, 16 Nov 2001 16:30:55 +0000


Hi...

A while back, I posted a patch which replaced the current SymbolParser
objects with SymbolTokenizations, which encapsulate both
Symbol -> string and string -> Symbol mappings in a single object.
I've been maintaining this code as a `branch' of the main development,
including all the changes from the trunk.  It all seeems to be nice
and stable.

Anyway, I'd like to see this code checked in soon.  It would certainly
be worth getting this change out of the way before we start on any
naming and directory work.  Therefore, unless there are any objections,
I'm planning to check the code in on monday or tuesday of next week.

This change will require modifications to some (hopefully not too
many) applications.  The changes which might affect existing code
are:

  - The getParser() method on Alphabets has been replaced by
    getTokenization(), which returns a SymbolTokenization object.

  - All functions for SymbolParsers have been replaced by 
    SymbolTokenizations.  However, they don't have the equivalent
    of:

          SymbolList sl = symParser.parse("agttcga");

    Instead, use the constructor:

          new SimpleSymbolList(tokenization, "agttcga");

    (or, of course, use one of the various convenience methods like
    DNATools.createDNA();

  - Symbols no longer have a getToken() method.  Code which uses
    this will have to either:

       + use getName() instead

       + get a SymbolTokenization from the appropriate Alphabet, then
         use the tokenizeSymbol method.

       + For the specific case of DNA, there is a convenient method

            DNATools.dnaToken(symbol);

         added by popular request.


The patch is a bit big to send out to the list, but I'll send a copy
by e-mail or whatever to anyone who's interested,

      Thomas.