[Biojava-l] Parser failure

Chun-Nuan Chen chun@bioweircom.org
Fri, 24 May 2002 08:08:04 -0700


  Andrey,

I compared the latest CVS checkout with one checked out a few days ago. 
Some changes has been made to fixed the ArrayIndexOutOfBoundsException 
problem:

---------------------------
700c700,701
<     for (int i = 0; i < TAG_LENGTH; i++)
---
 >   int len = Math.min(l.length, TAG_LENGTH); // handles empty lines better
 >     for (int i = 0; i < len; i++)

----------------------------

Also in the new GenbankFormat.java, the ArrayIndexOutOfBoundsException 
is caught.  So you should not see it thrown in the output.

Indeed, my test also seems to be successful:
-----------------------------------
$java -Xmx100M TestReadingGenBankFiles

Loading sequence...
Loaded...
Getting seq...
Got chr_X

** I have to increase the max heap size to something around 100M or larger
(setting it to 80M still throws java.lang.OutOfMemoryError).

--------------------------------

So I would suggest you to try the latest biojava build or CVS checkout.

Regards,

Chun-Nuan





Andrey Zinovyev wrote:

>Hi!
>
>What's wrong with this code:
>
>I've got a sequence in GenBank format from
>ftp://ncbi.nlm.nih.gov/genbank/genomes/C_elegans/CHR_I/worm_X.gbk
>
>and tried to parse it with this code:
>-----------------------
>import org.biojava.bio.*;
>import org.biojava.bio.symbol.*;
>import org.biojava.bio.seq.*;
>import org.biojava.bio.seq.io.*;
>import java.io.*;
>import java.util.*;
>
>public class TestReadingGenBankFiles {
>  public TestReadingGenBankFiles() {
>  }
>  public static void main(String[] args) {
>  try{
>      File GenBankFile = new File("worm_X.gbk");
>      System.out.println("Loading sequence...");
>      BufferedReader eReader = new BufferedReader(
>        new InputStreamReader(new FileInputStream(GenBankFile)));
>      SequenceIterator seqI = SeqIOTools.readGenbank(eReader);
>      System.out.println("Loaded...");
>      System.out.println("Getting seq...");
>      Sequence seq = seqI.nextSequence();
>      System.out.println("Got "+seq.getName());
>  }catch(Throwable t){      t.printStackTrace();};
>  }
>}
>---------------------------
>
>Though this code worked on many sequencies, here I have
>
>Loading sequence...
>Loaded...
>Getting seq...
>java.lang.ArrayIndexOutOfBoundsException
> at
>org.biojava.bio.seq.io.GenbankContext.hasHeaderTag(GenbankFormat.java:685)
> at
>org.biojava.bio.seq.io.GenbankContext.processHeaderLine(GenbankFormat.java:5
>44)
> at
>org.biojava.bio.seq.io.GenbankContext.processFeatureLine(GenbankFormat.java:
>497)
> at
>org.biojava.bio.seq.io.GenbankContext.processLine(GenbankFormat.java:364)
> at
>org.biojava.bio.seq.io.GenbankFormat.readSequence(GenbankFormat.java:137)
> at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:100)
>rethrown as org.biojava.bio.BioException: Could not read sequence
> at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:103)
> at caijava.TestReadingGenBankFiles.main(TestReadingGenBankFiles.java:35)
>-------------------------------------
>
>What's wrong? Is the parser could be applied to such long sequencies or
>there are limitations?
>
>Thanks,
>Andrey Zinovyev.
>
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l@biojava.org
>http://biojava.org/mailman/listinfo/biojava-l
>  
>