[Biojava-dev] Flat File genomic Indexing with OBDA

Sicotte, Hugues (NIH/NCI) sicotteh at mail.nih.gov
Thu Mar 27 13:27:21 EST 2003


I tried to use the indexer org.biojava.app.BioFlatIndex on a really long
genomic sequences and it doesn't work.
(it worked on my small test sequences, but it doesn't like 230Kb sequences!)

I'm running on a Solaris machine with 4 Gigs of RAM and an extra 5Gigs of
swap space.
I run java with (to increase the memory of the JVM to 2Gigs)
java -Xms2000m and -Xmx2000m org.biojava.app.BioFlatIndex -c -a dna -l
/usr/tmp/ -d humgen -i flat -f fasta /usr/tmp/long.fa

my test sequence is 230 thousand nucleotide long .. on a single line.
the error message is '46' . 
I added code to catch 'out of memory' errors.. and it's not that.

I want to write a servlet to retrieve small chunks of the human genome.
I want to use the indexing to get the offset into a file, and I use the
start/stop to
figure out an additional offset into the file. [I already wrote a class file
that implements that method
by extending FlatSequenceDB.java .. but that is beyond the scope of this
bug.]

.. so my fasta files have all the sequence on really LONG line. (like
formatdb for blast does).
[I was sad that the specification didn't require restoring fasta on a single
line. :( ]


Do you have any idea which section of the biojava is having a limitation on
line length? (
and would spit out an error code '46')
I started debugging, but it's a real nightmare since I am not too familiar
with the biojava data model.

Hugues Sicotte

p.s. I'm using the most recent biojava cvs dump from last week.




More information about the biojava-dev mailing list