[Biojava-l] Sequence object size: the sequel...

Sylvain Foisy sylvain.foisy@bioneq.qc.ca
18 Oct 2002 10:52:46 -0400


Hi,

I found why my program would crash with IndexOutOfBound error. I now
have a different problem. Here is my little program for parsing genomic
contigs as found at ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens. I am
using the .fa file for each chromosome.

import java.io.*;
import org.biojava.bio.symbol.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.io.*;

public class WECBFinder
{
  public static void main(String[] args)throws IOException
  {
    BufferedReader inFile;
    PrintWriter outFile;
    Sequence seq;
    String sequence=" ";
    String lePath=" ";
    int chr;
    int nbSequences=0;

    try
    {
      System.out.println("Programme de recuperation de sequences
provenant du genome humain");
      System.out.print("Entrer le chromosome desire: ");

      BufferedReader stdin=new BufferedReader(new
InputStreamReader(System.in));
      chr=Integer.parseInt(stdin.readLine());
      System.out.println("Recuperation de la sequence du chromosome "+
chr);

      lePath="/databanks/h_sapiens_chr/hs_chr"+chr+".fa";

      inFile=new BufferedReader(new FileReader(lePath));

      SequenceIterator stream=SeqIOTools.readFastaDNA(inFile);

      while(stream.hasNext())
      {
        seq=stream.nextSequence();
        System.out.println(seq.getName());
        System.out.println(seq.subStr(1,10));
        System.out.println(seq.subStr(seq.length()-9,seq.length()));
        nbSequences++;
      }

      System.out.println("Fin de la lecture du fichier contenant le
chromosome "+chr);
      System.out.println("Nbre de sequences individuelles dans le
fichier: "+nbSequences);
    }

    catch (Exception e)
    {
      System.err.println(e);
      e.printStackTrace();
    }
  }
} 

If I try with chromosome 18, it works very well. When I am using
chromosome 22, I get a java.lang.OutOfMemoryError message. My machine
has 1GB of RAM; I tried it on a 2 GB machine and it works.

Anybody knows a way to make Sequence objects smaller of to circumvent
this problem? I am writing each Sequence object to its own file and then
flushing it to make space for the next one...

I plead guilty: I am considering myself pretty new using BioJava.

Cordially

Sylvain