[BioSQL-l] Genbank loading time

Wed Jan 28 16:29:50 UTC 2009

On Jan 28, 2009, at 5:50 AM, Peter wrote:

> On Tue, Jan 27, 2009 at 10:57 PM, Richard Holland wrote:
>>
>> As for BioPerl/BioPython/etc. I expect their respective project  
>> authors
>> will respond to this thread accordingly with the figures from their  
>> own
>> domains!
>
> I can tell you importing GenBank files into BioSQL with Biopython is
> faster than BioPerl, sometimes several times faster, but this will
> depend on the nature of the files (e.g. genomes versus ESTs).
> http://lists.open-bio.org/pipermail/biosql-l/2008-August/001320.html
> http://lists.open-bio.org/pipermail/biopython-dev/2008-April/003625.html

I don't think sequence loading via load_seqdatabase.pl uses BioPerl.   
If one uses BioPerl and bioperl-db the following can explain at least  
some of the reason why loading is slow:

http://www.bioperl.org/wiki/Why_BioPerl_is_slow

We also go through the extra hand-wringing with Bio::Species objects  
(something I don't think the other Bio* worry about).

Regardless, it's not an easy problem to work around.  There are such  
things as Moose, and Perl6 is now in alpha...

chris

> I don't have any BioJava comparison figures.  In any case, as Richard
> points out, there will be slight differences in the different Bio*
> tools how exactly how the data is parsed and stored.
>
> I've never tries to import the whole of GenBank, so I don't have any
> numbers for you there.
>
> Peter
> (Biopython)