[Biojava-l] RichSequenceIterator.nextSequence does not move to next sequence when an exception is thrown
Richard Holland
dicknetherlands at gmail.com
Thu Jul 10 10:21:30 UTC 2008
Hello. You appear to have hit a bit of a limitation with the system.
The sequence iterator doesn't know how to skip over bad records (in
fact, the parsers themselves do not - they just give up at the first
sign of a failed line). I'll have to have a think about how to fix
this, as it's not immediately obvious (although it definitely needs to
be done).
cheers,
Richard
2008/7/10 Martin Jones <martin.jones at ed.ac.uk>:
> Hi,
>
> I have a file containing GenBank records, and I want to process them thus:
>
> RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader,
> null);
> while (seqs.hasNext()) {
> RichSequence seq = seqs.nextRichSequence();
> // processing code
> }
>
> however, some records cannot be parsed by biojava... this is to be expected
> as I'm processing half a million records - some are bound to be wonky. So I
> use a try-catch to skip over troublesome records:
>
>
> RichSequenceIterator seqs = RichSequence.IOTools.readGenbankDNA(myReader,
> null);
> while (seqs.hasNext()) {
> try{
> RichSequence seq = seqs.nextRichSequence();
> // processing code
> } catch (BioException e){
> System.out.println("record count not be parsed!");
> }
> }
>
> However, it seems that the position in the input file is not changed if an
> exception is thrown during parsing. If I run the above code on a file
> containing a single un-parseable record, it gets stuck in a non-terminating
> loop - i.e. each time seqs.nextRichSequence() is called, an exception is
> thrown, but seqs.hasNext() still returns true. Is there a correct way to
> deal with this? I could split up my input file into multiple records and do
> something like:
>
> ArrayList<String> records = splitGenBankFileIntoRecords();
> for (String singleRecord : records){
> BufferedReader singleRecordReader = new BufferedReader(new
> StringReader(singleRecord));
> RichSequenceIterator seqs =
> RichSequence.IOTools.readGenbankDNA(singleRecordReader, null);
> try{
> RichSequence seq = seqs.nextRichSequence();
> // processing code
> } catch (BioException e){
> System.out.println("record count not be parsed!");
> }
>
> }
>
> but this seems inefficient, as I have to instantiate a new StringReader,
> BufferedReader and RichSequenceIterator for every record (half a milion
> cycles of object creation/destruction!)
>
> Any ideas?
>
>
>
> --
> ------------------------
>
> Martin Jones
> School of Biological Sciences,
> Ashworth Laboratories, King's Buildings
> Edinburgh, EH9 3JT, UK
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
More information about the Biojava-l
mailing list