[Biojava-l] RichSequence.IOTools performance

Richard Holland holland at eaglegenomics.com
Mon Mar 28 17:23:44 UTC 2011


In which case you've got little option but to rewrite the GenbankFormat module to use NIO or other alternative methods for writing files. However before you do that I suggest you investigate the recent BioJava3 developments to see if they've already done anything in this area - Andy Yates is your man there.

On 28 Mar 2011, at 18:11, Khalil El Mazouari wrote:

> Sequences objects are all in-memory.
> I agree, 10000 seq in ± 20 sec is not bad. However, scientists will processes 100,000 seqs in each run, and IO is a real  bottleneck. So, I am trying, as far as I can, to fine tune the app.
> 
> Regards,
> 
> khalil
> 
> On 28 Mar 2011, at 18:15, Richard Holland wrote:
> 
>> I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too.
>> 
>> 
>> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote:
>> 
>>> Hi,
>>> 
>>> I am developing a sequence annotation app. It should handle ± 100.000 sequence per run.
>>> 
>>> When profiling the app (with 10.000 seq), the total execution time was ± 20 seconds, of which 57% was used for   RichSequence.IOTools.writeGenbak!!
>>> 
>>> How one could improve the RichSequence.IOTools performance? 
>>> 
>>> Thanks.
>>> 
>>> khalil
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>> 
> 

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/





More information about the Biojava-l mailing list