[Biojava-l] IntegerAlphabet IntegerSymbol
Thomas Down
td2@sanger.ac.uk
Sun, 21 Oct 2001 09:28:00 +0100
On Fri, Oct 19, 2001 at 03:56:54PM -0700, David Waring wrote:
>
> I am working on bio.program.PhredSequence and its friends (for handling the
> qualitative data associated with the output of Phred). PhredSequence uses
> SymbolLists with an IntegerAlphabet. At present the getToken() method of
> IntergerAlphabet.IntegerSymbol returns '#'. I guess this is because the
> Symbol interface specifies that getToken() return a char. Shouldn't this be
> a String? Afterall SymbolParser parseToken() parses a String, and aren't we
> dealing with alphabets that can have multi-character tokens such as the 3
> letter amino acids names? Has this issue come up before? Am I
> misunderstanding 'token'?
>
> One of the things that must be done with at PhredSequnece is to write the
> quality data (an IntegerAlphabet based SymbolList) to a fasta-like format.
> I'd like to just create a Sequence with the quality SymbolList and be able
> to write this using a FastaFormat. But since FastaFormat calls seqString()
> and that is coded in AbstractSymbolList to use getToken() it can only deal
> with chars so it can't handle IntegerSymbols. Another is issue is that with
> an IntegerSymbolList one would really like the seqString to output something
> like '10 20 22 7' as opposed to '1020227'.
>
> Three options:
> 1) Create a new SequenceFormat just for this, and if there will be no other
> use of IntegerSymbolList perhaps this is the best way to go.
>
> 2) Create an IntegerSymbolList that extends SimpleSymbolList overriding
> seqString().
>
> 3) (most invasive but perhaps cleanest) Change getToken() to return an
> String, or adding toString() to Symbol and add a method paddedSeqString() to
> AbstractSymbolList.
4) get rid of getToken() completely, and change the way that sequences
get converted to strings -- replacing hardwired code in SymbolList
implementations with pluggable `stringifiers'.
This was the idea of my SymbolTokenizations patch which I posted
a few days ago. Certainly my view is that is provides a much
cleaner framework for handling this kind of situtation, and I'd
urge you to take a look.
Thomas