[Biojava-l] IntegerAlphabet IntegerSymbol

Mark Schreiber Mark Schreiber <mark_s@sanger.otago.ac.nz>
Sun, 21 Oct 2001 20:07:41 +1300 (NZDT)


Hi -

The class PhredTools contains methods to read and write phred quality data
in fasta like format. The issue of spacing integers is handeled by the
PhredFormat class. There are a number of classes that cannot return
sensible results for getToken() such as HMM states.

It is unfortunate that this method cannot be guarenteed to return an
informative result.

Mark

On Fri, 19 Oct 2001, David Waring wrote:

> 
> I am working on bio.program.PhredSequence and its friends (for handling the
> qualitative data associated with the output of Phred). PhredSequence uses
> SymbolLists with an IntegerAlphabet. At present the getToken() method of
> IntergerAlphabet.IntegerSymbol returns '#'. I guess this is because the
> Symbol interface specifies that getToken() return a char. Shouldn't this be
> a String?  Afterall SymbolParser parseToken() parses a String, and aren't we
> dealing with alphabets that can have multi-character tokens such as the 3
> letter amino acids names? Has this issue come up before? Am I
> misunderstanding 'token'?
> 
> One of the things that must be done with at PhredSequnece is to write the
> quality data (an IntegerAlphabet based SymbolList) to a fasta-like format.
> I'd like to just create a Sequence with the quality SymbolList and be able
> to write this using a FastaFormat. But since FastaFormat calls seqString()
> and that is coded in AbstractSymbolList to use getToken() it can only deal
> with chars so it can't handle IntegerSymbols. Another is issue is that with
> an IntegerSymbolList one would really like the seqString to output something
> like '10 20 22 7' as opposed to '1020227'.
> 
> Three options:
> 1) Create a new SequenceFormat just for this, and if there will be no other
> use of IntegerSymbolList perhaps this is the best way to go.
> 
> 2) Create an IntegerSymbolList that extends SimpleSymbolList overriding
> seqString().
> 
> 3) (most invasive but perhaps cleanest) Change getToken() to return an
> String, or adding toString() to Symbol and add a method paddedSeqString() to
> AbstractSymbolList.
> 
> Preferences, suggestions?
> 
> David
> 
> |||||||||||||||||||||||||||||||||||||||||||||||||||||||
> |   David Waring
> |   Systems Programmer
> |   University of Washington Genome Center
> |   dwaring@u.washington.edu
> |   (206) 221-6902
> |||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mark Schreiber			Ph: 64 3 4797875
Rm 218				email mark_s@sanger.otago.ac.nz
Department of Biochemistry	email m.schreiber@clear.net.nz
University of Otago		
PO Box 56
Dunedin
New Zealand
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~