[Biojava-l] BufferedReader and FastaFormat

Thomas Down td2@sanger.ac.uk
Mon, 4 Mar 2002 23:54:58 +0000


On Tue, Mar 05, 2002 at 10:07:11AM +1300, Schreiber, Mark wrote:
> 
> java.io.IOException: Mark invalid
> 	at java.io.BufferedReader.reset(BufferedReader.java:467)
> 	at
> org.biojava.bio.seq.io.FastaFormat.readSequenceData(FastaFormat.java:164
> )
> 	at
> org.biojava.bio.seq.io.FastaFormat.readSequence(FastaFormat.java:121)
> 	at
> org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:100)
> rethrown as org.biojava.bio.BioException: Could not read sequence
> 	at
> org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:103)

I've heard a few problems with the mark/restore sytem
on BufferedReader.  They can usually be fixed by setting
the mark-validity a bit further ahead than you're actually
planning to read.

I wonder if your file contains any DOS-type line endings
(\r\n).  There was one case where someone was having trouble
with EMBL parsing, and fixed it by stripping these out.

Alternatively, you could just try changing:

   r.mark(cache.length);

To:

   r.mark(cache.length + 50); // fudge factor.

It's not a particularly  satisfying solution until we understand
exactly where the marks are falling out of scope, but I bet
it will get your file parsing...

     Thomas.