[BioSQL-l] Genbank loading time

Wed Jan 28 16:53:49 UTC 2009

On Jan 28, 2009, at 10:40 AM, Peter wrote:

> On Wed, Jan 28, 2009 at 4:29 PM, Chris Fields  
> <cjfields at illinois.edu> wrote:
>>
>> I don't think sequence loading via load_seqdatabase.pl uses  
>> BioPerl.  If one
>> uses BioPerl and bioperl-db the following can explain at least some  
>> of the
>> reason why loading is slow:
>> http://www.bioperl.org/wiki/Why_BioPerl_is_slow
>> We also go through the extra hand-wringing with Bio::Species objects
>> (something I don't think the other Bio* worry about).
>
> Looking at the source code for the load_seqdatabase.pl script included
> with bioperl-db, my impression is it uses Bio::DB::BioDB to talk to
> the database, and Bio::SeqIO to parse the input sequence files (in
> this case, Bio::SeqIO::genbank is used).  See:
>
> http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-db/trunk/scripts/biosql/load_seqdatabase.pl

My bad, I'm thinking of the taxonomy loader (need more coffee).  I'm  
wondering, though, whether it would be feasible to have a direct  
loader for the most common database formats (GenBank/EMBL/Swiss),  
something similar to the taxonomy loader that doesn't rely on any  
specific Bio* package.

>> Regardless, it's not an easy problem to work around.  There are  
>> such things
>> as Moose, and Perl6 is now in alpha...
>
> I'll take your word for it - I'm in no position to improve anyone's  
> Perl code ;)
>
> Peter

Well, the problem lies with perl5's welded-on OO which isn't easy to  
work around, particularly inheritance issues.  Supposedly Moose helps  
speed things up a bit; it doesn't hurt that it is based somewhat on  
perl6's Objects:

http://feather.perl6.nl/syn/S12.html

chris