[Bioperl-l] Re: stale links, EMBL loading
Hilmar Lapp
hlapp at gmx.net
Sun Jun 15 22:18:35 EDT 2003
On Sunday, June 15, 2003, at 12:25 AM, Niels Larsen wrote:
> The links
>
> http://bio.perl.org/SRC/branch-1-2/Bio/Tools/Run/WrapperBase.pm
> http://bio.perl.org/bioperl-bugs
> http://bioperl.org/Related.html
>
> and probably others, return error 404.
Where did you find these?
>
> Then, I am looking into bioperl, hope to be able to use it and if so,
> contribute. While trying SeqIO, I got the error below; this error comes
> only when I use Bio::SeqIO from a script where I also invoke my own
> error-catching module which traps this,
>
> use sigtrap qw ( die normal-signals stack-trace any error-signals );
>
I responded in a separate email that this is from bioperl-db because
you can't intercept die's (bioperl-db needs them to react upon).
> The error below I created by including my error-module in the script
>
> bioperl-db/scripts/biosql/load_seqdatabase.pl
>
> Btw, to load a new EMBL/GenBank/DDBJ release in hours instead of
> days, should I write something that creates temporary files (say one
> per .dat file) and loads those in one go, instead of one entry at a
> time ..
> or does some other solution exist (I couldnt find it)?
>
Look at the CPU-load distribution between the perl process and the RDMS
process (mysql? pg? Oracle?). What I get with richly annotated formats
like genbank is about 0.7 perl and 0.15-0.3 for the RDBMS process. If
this is sort of the balance you see then dividing into chunks will
help. Don't do one entry per file though, rather chunk in larger units
and then load in parallel processes. E.g. chunk by genbank section
(primates, rodents, etc, you get the idea).
If you load the CPUs on the db server with one loader process already,
then firing up a second one will only degrade performance. This also
means, don't run too many loaders in parallel or otherwise you will
suffer from contention for disk IO and transactional locks.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list