[Biojava-dev] RE: Flat File genomic Indexing with OBDA

Sicotte, Hugues (NIH/NCI) sicotteh at mail.nih.gov
Thu Mar 27 15:42:36 EST 2003


I chased the problem partway using a debugger.
the problem is that the line length is limited to 16K (1<<14)
while the biggest human chromosome could be as large as 40M once completed.
 

 public class CountedBufferedReader extends BufferedReader {
        private final static int DEFAULT_BUFFER_SIZE = 1 << 26; // was 1<<14

.. I fixed that in my version .. 

but it crashed somewhere else as it was writing the final version,
running out of memory.
I can't debug it anymore on my PC (I don't have that much memory) with a
graphical debugger.
.. and I can't use the GUI debugger on the solaris machine.

While this was a temp fix.. increasing the buffer size is not the way to
go.. 
that function should be rewritten to loop over the buffer until it fills up
a line.


Hugues
>  -----Original Message-----
> From: 	Sicotte, Hugues (NIH/NCI)  
> Sent:	Thursday, March 27, 2003 1:27 PM
> To:	'biojava-dev at biojava.org'
> Subject:	Flat File genomic Indexing with OBDA
> 
> 
> I tried to use the indexer org.biojava.app.BioFlatIndex on a really long
> genomic sequences and it doesn't work.
> (it worked on my small test sequences, but it doesn't like 230Kb
> sequences!)
> 
> I'm running on a Solaris machine with 4 Gigs of RAM and an extra 5Gigs of
> swap space.
> I run java with (to increase the memory of the JVM to 2Gigs)
> java -Xms2000m and -Xmx2000m org.biojava.app.BioFlatIndex -c -a dna -l
> /usr/tmp/ -d humgen -i flat -f fasta /usr/tmp/long.fa
> 
> my test sequence is 230 thousand nucleotide long .. on a single line.
> the error message is '46' . 
> I added code to catch 'out of memory' errors.. and it's not that.
> 
> I want to write a servlet to retrieve small chunks of the human genome.
> I want to use the indexing to get the offset into a file, and I use the
> start/stop to
> figure out an additional offset into the file. [I already wrote a class
> file that implements that method
> by extending FlatSequenceDB.java .. but that is beyond the scope of this
> bug.]
> 
> .. so my fasta files have all the sequence on really LONG line. (like
> formatdb for blast does).
> [I was sad that the specification didn't require restoring fasta on a
> single line. :( ]
> 
> 
> Do you have any idea which section of the biojava is having a limitation
> on line length? (
> and would spit out an error code '46')
> I started debugging, but it's a real nightmare since I am not too familiar
> with the biojava data model.
> 
> Hugues Sicotte
> 
> p.s. I'm using the most recent biojava cvs dump from last week.
> 
> 


More information about the biojava-dev mailing list