[Biojava-l] RE: [Biojava-dev] PhredFormat

Sun Nov 30 17:19:12 EST 2003

Hi Frans -

Thanks for these changes. I have committed them to cvs and added "default" as a valid tokenization of IntegerAlphabet (as a synonym of "token").

- Mark

-----Original Message-----
From: VERHOEF Frans [mailto:verhoeff2 at gis.a-star.edu.sg] 
Sent: Friday, 28 November 2003 4:34 p.m.
To: biojava-dev at biojava.org; biojava-l at biojava.org
Subject: [Biojava-dev] PhredFormat

Hi,

I have fixed the little bugs in PhredFormat bugging me for the last 2 days. I have attached the version fixed by me. Feel free to use it, change it or throw it.
In short what I have changed is this:

-          PhredFormat implements ParseErrorSource and ParseErrorListener. This was not much of a job, as I basically copied it from FastaFormat.
-          readSequenceData(BufferedReader br, SymbolTokenization parser, SeqIOListener listener) has changed. This method used to parse char arrays for short number strings and feed it to the StreamParser, which in turn would try to do the same. As in the process the whitespaces were removed, in the end a String representing a humongous number was tried to be parsed to integer. Now this method does not parse the char arrays, but just feeds whole chunks of char array to the StreamParser.

One new issue came up though, when I am trying to do the following:

            StreamReader qualityIter = PhredTools.readPhredQuality(new BufferedReader(new FileReader(phredQualityFile)));
            While (qualityIter.hasNext()){
                Sequence seq = qualityIter.nextSequence();
                String str = seq.seqString();
            }

The last line gave the following exception:

            java.util.NoSuchElementException: default parser not supported by IntegerAlphabet yet
            at org.biojava.bio.symbol.IntegerAlphabet.getTokenization(IntegerAlphabet.java:216)
            at org.biojava.bio.symbol.AbstractSymbolList.seqString(AbstractSymbolList.java:101)
            at org.biojava.bio.seq.impl.SimpleSequence.seqString(SimpleSequence.java:108)
            at org.gis.server.pipeline.apps.SequenceInfoParser.parseResults(SequenceInfoParser.java:82)

What happens is that SimpleSequence calls the AbstractSymbolList.seqString() method. This method in turn executes getAlphabet().getTokenization("default"), where getAlphabet returns the IntegerAlphabet. But IntegerAlphabet throws the Exception here, because it only except a name parameter value "token" and not the "default" that AbstractSymbolList gives. I do have simple workaround, that basically where the method IntegerAplhabet.getTokenization(String name) accepts both "default" and "token". 
But I am not sure I here understand the philosophy behind the design completely...

Kind regards,

Frans Verhoef
Bioinformatics Specialist
Genome Institute of Singapore
Genome, #02-01, 60 Biopolis Street, Singapore 138672
Tel: +65 6478 8000
DID: +65 6478 8060
HP: +65 9848 4325
Email: verhoeff2 at gis.a-star.edu.sg

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================