[BioSQL-l] Genbank loading time
Chris Fields
cjfields at illinois.edu
Wed Jan 28 16:29:50 UTC 2009
On Jan 28, 2009, at 5:50 AM, Peter wrote:
> On Tue, Jan 27, 2009 at 10:57 PM, Richard Holland wrote:
>>
>> As for BioPerl/BioPython/etc. I expect their respective project
>> authors
>> will respond to this thread accordingly with the figures from their
>> own
>> domains!
>
> I can tell you importing GenBank files into BioSQL with Biopython is
> faster than BioPerl, sometimes several times faster, but this will
> depend on the nature of the files (e.g. genomes versus ESTs).
> http://lists.open-bio.org/pipermail/biosql-l/2008-August/001320.html
> http://lists.open-bio.org/pipermail/biopython-dev/2008-April/003625.html
I don't think sequence loading via load_seqdatabase.pl uses BioPerl.
If one uses BioPerl and bioperl-db the following can explain at least
some of the reason why loading is slow:
http://www.bioperl.org/wiki/Why_BioPerl_is_slow
We also go through the extra hand-wringing with Bio::Species objects
(something I don't think the other Bio* worry about).
Regardless, it's not an easy problem to work around. There are such
things as Moose, and Perl6 is now in alpha...
chris
> I don't have any BioJava comparison figures. In any case, as Richard
> points out, there will be slight differences in the different Bio*
> tools how exactly how the data is parsed and stored.
>
> I've never tries to import the whole of GenBank, so I don't have any
> numbers for you there.
>
> Peter
> (Biopython)
More information about the BioSQL-l
mailing list