[Biojava-l] RichSequenceIterator.nextSequence does not move to next sequence when an exception is thrown
Martin Jones
martin.jones at ed.ac.uk
Thu Jul 10 10:13:28 UTC 2008
Hi,
I have a file containing GenBank records, and I want to process them thus:
RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader,
null);
while (seqs.hasNext()) {
RichSequence seq = seqs.nextRichSequence();
// processing code
}
however, some records cannot be parsed by biojava... this is to be expected
as I'm processing half a million records - some are bound to be wonky. So I
use a try-catch to skip over troublesome records:
RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader,
null);
while (seqs.hasNext()) {
try{
RichSequence seq = seqs.nextRichSequence();
// processing code
} catch (BioException e){
System.out.println("record count not be parsed!");
}
}
However, it seems that the position in the input file is not changed if an
exception is thrown during parsing. If I run the above code on a file
containing a single un-parseable record, it gets stuck in a non-terminating
loop - i.e. each time seqs.nextRichSequence() is called, an exception is
thrown, but seqs.hasNext() still returns true. Is there a correct way to
deal with this? I could split up my input file into multiple records and do
something like:
ArrayList<String> records = splitGenBankFileIntoRecords();
for (String singleRecord : records){
BufferedReader singleRecordReader = new BufferedReader(new
StringReader(singleRecord));
RichSequenceIterator seqs =
RichSequence.IOTools.readGenbankDNA(singleRecordReader, null);
try{
RichSequence seq = seqs.nextRichSequence();
// processing code
} catch (BioException e){
System.out.println("record count not be parsed!");
}
}
but this seems inefficient, as I have to instantiate a new StringReader,
BufferedReader and RichSequenceIterator for every record (half a milion
cycles of object creation/destruction!)
Any ideas?
--
------------------------
Martin Jones
School of Biological Sciences,
Ashworth Laboratories, King's Buildings
Edinburgh, EH9 3JT, UK
More information about the Biojava-l
mailing list