[Biojava-l] RichSequence.IOTools performance

Khalil El Mazouari khalil.elmazouari at gmail.com
Mon Mar 28 17:11:58 UTC 2011


Sequences objects are all in-memory.
I agree, 10000 seq in ± 20 sec is not bad. However, scientists will processes 100,000 seqs in each run, and IO is a real  bottleneck. So, I am trying, as far as I can, to fine tune the app.

Regards,

khalil

On 28 Mar 2011, at 18:15, Richard Holland wrote:

> I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too.
> 
> 
> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote:
> 
>> Hi,
>> 
>> I am developing a sequence annotation app. It should handle ± 100.000 sequence per run.
>> 
>> When profiling the app (with 10.000 seq), the total execution time was ± 20 seconds, of which 57% was used for   RichSequence.IOTools.writeGenbak!!
>> 
>> How one could improve the RichSequence.IOTools performance? 
>> 
>> Thanks.
>> 
>> khalil
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> 





More information about the Biojava-l mailing list