[BioSQL-l] Genbank loading time
Chris Fields
cjfields at illinois.edu
Wed Jan 28 16:53:49 UTC 2009
On Jan 28, 2009, at 10:40 AM, Peter wrote:
> On Wed, Jan 28, 2009 at 4:29 PM, Chris Fields
> <cjfields at illinois.edu> wrote:
>>
>> I don't think sequence loading via load_seqdatabase.pl uses
>> BioPerl. If one
>> uses BioPerl and bioperl-db the following can explain at least some
>> of the
>> reason why loading is slow:
>> http://www.bioperl.org/wiki/Why_BioPerl_is_slow
>> We also go through the extra hand-wringing with Bio::Species objects
>> (something I don't think the other Bio* worry about).
>
> Looking at the source code for the load_seqdatabase.pl script included
> with bioperl-db, my impression is it uses Bio::DB::BioDB to talk to
> the database, and Bio::SeqIO to parse the input sequence files (in
> this case, Bio::SeqIO::genbank is used). See:
>
> http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-db/trunk/scripts/biosql/load_seqdatabase.pl
My bad, I'm thinking of the taxonomy loader (need more coffee). I'm
wondering, though, whether it would be feasible to have a direct
loader for the most common database formats (GenBank/EMBL/Swiss),
something similar to the taxonomy loader that doesn't rely on any
specific Bio* package.
>> Regardless, it's not an easy problem to work around. There are
>> such things
>> as Moose, and Perl6 is now in alpha...
>
> I'll take your word for it - I'm in no position to improve anyone's
> Perl code ;)
>
> Peter
Well, the problem lies with perl5's welded-on OO which isn't easy to
work around, particularly inheritance issues. Supposedly Moose helps
speed things up a bit; it doesn't hurt that it is based somewhat on
perl6's Objects:
http://feather.perl6.nl/syn/S12.html
chris
More information about the BioSQL-l
mailing list