[Biojava-l] RichSequenceIterator.nextSequence does not move to next sequence when an exception is thrown

Martin Jones martin.jones at ed.ac.uk
Thu Jul 10 10:13:28 UTC 2008


Hi,

I have a file containing GenBank records, and I want to process them thus:

RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader,
null);
while (seqs.hasNext()) {
     RichSequence seq = seqs.nextRichSequence();
     // processing code
}

however, some records cannot be parsed by biojava... this is to be expected
as I'm processing half a million records - some are bound to be wonky.  So I
use a try-catch to skip over troublesome records:


RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader,
null);
while (seqs.hasNext()) {
     try{
         RichSequence seq = seqs.nextRichSequence();
         // processing code
     } catch (BioException e){
          System.out.println("record count not be parsed!");
     }
}

However, it seems that the position in the input file is not changed if an
exception is thrown during parsing.  If I run the above code on a file
containing a single un-parseable record, it gets stuck in a non-terminating
loop - i.e. each time seqs.nextRichSequence() is called, an exception is
thrown, but seqs.hasNext() still returns true.  Is there a correct way to
deal with this?  I could split up my input file into multiple records and do
something like:

ArrayList<String> records = splitGenBankFileIntoRecords();
for (String singleRecord : records){
     BufferedReader singleRecordReader = new BufferedReader(new
StringReader(singleRecord));
     RichSequenceIterator seqs =
RichSequence.IOTools.readGenbankDNA(singleRecordReader, null);
     try{
          RichSequence seq = seqs.nextRichSequence();
          // processing code
     } catch (BioException e){
          System.out.println("record count not be parsed!");
     }

}

but this seems inefficient, as I have to instantiate a new StringReader,
BufferedReader and RichSequenceIterator for every record (half a milion
cycles of object creation/destruction!)

Any ideas?



-- 
------------------------

Martin Jones
School of Biological Sciences,
Ashworth Laboratories, King's Buildings
Edinburgh, EH9 3JT, UK



More information about the Biojava-l mailing list