[Biojava-l] BufferedReader and FastaFormat

Schreiber, Mark mark.schreiber@agresearch.co.nz
Tue, 5 Mar 2002 10:07:11 +1300


Hi -

When working with small FASTA libraries (in terms of the number of
entries) the following snippet of code works fine,
        try{
          SequenceIterator i = SeqIOTools.readFastaDNA(new
BufferedReader(new FileReader(f)));
          while(i.hasNext()){
            Sequence s = i.nextSequence();
            db.addSequence(s);
          }catch(...... More code here.......


However when reading a FASTA library of approx 15600 sequences the
following exception is thrown.

java.io.IOException: Mark invalid
	at java.io.BufferedReader.reset(BufferedReader.java:467)
	at
org.biojava.bio.seq.io.FastaFormat.readSequenceData(FastaFormat.java:164
)
	at
org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:121)
	at
org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:100)
rethrown as org.biojava.bio.BioException: Could not read sequence
	at
org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:103)

When I increase the buffer size of the buffered reader to 6096 (an
arbitrarily chosen figure) the parsing seems to go fine. It seems like
this may be a potential flaw in the FastaFormat somewhere around here:

		if (parseStart < bytesRead && cache[parseStart] == '>')
{
		    r.reset();
		    if (r.skip(parseStart) != parseStart)
			throw new IOException("Couldn't reset to start
of next sequence");
		    reachedEnd = true;
		}

Notably the  if (r.skip(parseStart) != parseStart) can't even be reached
as r.reset() will spit the dummy first. Although a conveluted try catch
chain can keep on increasing the buffer size until a file is read this
is a bit of a pain and knowing in advance the required buffer size is
not always possible.

Any ideas on how to fix this problem?

Mark
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================