[Biojava-l] IntegerAlphabet IntegerSymbol

Thomas Down td2@sanger.ac.uk
Sun, 21 Oct 2001 09:28:00 +0100


On Fri, Oct 19, 2001 at 03:56:54PM -0700, David Waring wrote:
> 
> I am working on bio.program.PhredSequence and its friends (for handling the
> qualitative data associated with the output of Phred). PhredSequence uses
> SymbolLists with an IntegerAlphabet. At present the getToken() method of
> IntergerAlphabet.IntegerSymbol returns '#'. I guess this is because the
> Symbol interface specifies that getToken() return a char. Shouldn't this be
> a String?  Afterall SymbolParser parseToken() parses a String, and aren't we
> dealing with alphabets that can have multi-character tokens such as the 3
> letter amino acids names? Has this issue come up before? Am I
> misunderstanding 'token'?
> 
> One of the things that must be done with at PhredSequnece is to write the
> quality data (an IntegerAlphabet based SymbolList) to a fasta-like format.
> I'd like to just create a Sequence with the quality SymbolList and be able
> to write this using a FastaFormat. But since FastaFormat calls seqString()
> and that is coded in AbstractSymbolList to use getToken() it can only deal
> with chars so it can't handle IntegerSymbols. Another is issue is that with
> an IntegerSymbolList one would really like the seqString to output something
> like '10 20 22 7' as opposed to '1020227'.
> 
> Three options:
> 1) Create a new SequenceFormat just for this, and if there will be no other
> use of IntegerSymbolList perhaps this is the best way to go.
> 
> 2) Create an IntegerSymbolList that extends SimpleSymbolList overriding
> seqString().
> 
> 3) (most invasive but perhaps cleanest) Change getToken() to return an
> String, or adding toString() to Symbol and add a method paddedSeqString() to
> AbstractSymbolList.

4) get rid of getToken() completely, and change the way that sequences
   get converted to strings -- replacing hardwired code in SymbolList
   implementations with pluggable `stringifiers'.

This was the idea of my SymbolTokenizations patch which I posted
a few days ago.  Certainly my view is that is provides a much
cleaner framework for handling this kind of situtation, and I'd
urge you to take a look.

    Thomas