[Biojava-l] Parser failure

Andrey Zinovyev zinovyev@ihes.fr
Wed, 22 May 2002 19:02:13 +0200


Hi!

What's wrong with this code:

I've got a sequence in GenBank format from
ftp://ncbi.nlm.nih.gov/genbank/genomes/C_elegans/CHR_I/worm_X.gbk

and tried to parse it with this code:
-----------------------
import org.biojava.bio.*;
import org.biojava.bio.symbol.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.io.*;
import java.io.*;
import java.util.*;

public class TestReadingGenBankFiles {
  public TestReadingGenBankFiles() {
  }
  public static void main(String[] args) {
  try{
      File GenBankFile = new File("worm_X.gbk");
      System.out.println("Loading sequence...");
      BufferedReader eReader = new BufferedReader(
        new InputStreamReader(new FileInputStream(GenBankFile)));
      SequenceIterator seqI = SeqIOTools.readGenbank(eReader);
      System.out.println("Loaded...");
      System.out.println("Getting seq...");
      Sequence seq = seqI.nextSequence();
      System.out.println("Got "+seq.getName());
  }catch(Throwable t){      t.printStackTrace();};
  }
}
---------------------------

Though this code worked on many sequencies, here I have

Loading sequence...
Loaded...
Getting seq...
java.lang.ArrayIndexOutOfBoundsException
 at
org.biojava.bio.seq.io.GenbankContext.hasHeaderTag(GenbankFormat.java:685)
 at
org.biojava.bio.seq.io.GenbankContext.processHeaderLine(GenbankFormat.java:5
44)
 at
org.biojava.bio.seq.io.GenbankContext.processFeatureLine(GenbankFormat.java:
497)
 at
org.biojava.bio.seq.io.GenbankContext.processLine(GenbankFormat.java:364)
 at
org.biojava.bio.seq.io.GenbankFormat.readSequence(GenbankFormat.java:137)
 at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:100)
rethrown as org.biojava.bio.BioException: Could not read sequence
 at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:103)
 at caijava.TestReadingGenBankFiles.main(TestReadingGenBankFiles.java:35)
-------------------------------------

What's wrong? Is the parser could be applied to such long sequencies or
there are limitations?

Thanks,
Andrey Zinovyev.