[Biojava-l] RichSequence.IOTools performance

Andy Yates ayates at ebi.ac.uk
Mon Mar 28 21:39:54 UTC 2011


Dang Rich :). 

At the moment we've not done anything WRT Genbank outputting but would accept anything to help us out with this. 

As for the performance difference between BJ3 & BJ what happens if you use the writer objects directly with a BufferedOutputStream writer? Have you got any profiling results? It would be very interesting to see where we've lost the performance ...

Andy

On 28 Mar 2011, at 18:23, Richard Holland wrote:

> In which case you've got little option but to rewrite the GenbankFormat module to use NIO or other alternative methods for writing files. However before you do that I suggest you investigate the recent BioJava3 developments to see if they've already done anything in this area - Andy Yates is your man there.
> 
> On 28 Mar 2011, at 18:11, Khalil El Mazouari wrote:
> 
>> Sequences objects are all in-memory.
>> I agree, 10000 seq in ± 20 sec is not bad. However, scientists will processes 100,000 seqs in each run, and IO is a real  bottleneck. So, I am trying, as far as I can, to fine tune the app.
>> 
>> Regards,
>> 
>> khalil
>> 
>> On 28 Mar 2011, at 18:15, Richard Holland wrote:
>> 
>>> I would have thought 10,000 seqs written out in full Genbank format in 20 seconds was pretty good! However, the key to speeding it up would be to modify the OutputStream interactions to use faster things such as NIO. Also it would depend on the source of your sequence objects - if they are all in-memory then this isn't an issue, but if they are being read from a database using lazy or dynamic loading then that could be a bottleneck too.
>>> 
>>> 
>>> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am developing a sequence annotation app. It should handle ± 100.000 sequence per run.
>>>> 
>>>> When profiling the app (with 10.000 seq), the total execution time was ± 20 seconds, of which 57% was used for   RichSequence.IOTools.writeGenbak!!
>>>> 
>>>> How one could improve the RichSequence.IOTools performance? 
>>>> 
>>>> Thanks.
>>>> 
>>>> khalil
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> 
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
>>> http://www.eaglegenomics.com/
>>> 
>> 
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/








More information about the Biojava-l mailing list