[Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot

Hilmar Lapp hlapp at gmx.net
Sun Mar 23 01:40:55 UTC 2008


On Mar 22, 2008, at 7:36 PM, Erik wrote:
> The next thing is performance, it's really intolerably
> slow, and I don't think the database is the bottleneck -
> isn't it more likely bioperl object heaviness?  I get
> continuous near 100% load for 1 cpu (this machine has 2
> cpus).


Is the database on the same machine? If yes, and a significant  
fraction (~30-50% or even more) of the load are generated by the perl  
script, rather than almost everything coming from the postmaster,  
then indeed the database is not the bottleneck.

Of course, the bioperl object creation overhead takes a toll too. I  
would be surprised though if BioPerl can't parse more than 3.6  
records/s on a modern CPU; you can convince yourself of that though  
by writing a simple script along the lines of the following and see  
how fast that goes:

my $seqio = Bio::SeqIO->new(-file => '<uniprot_sprot.dat', -format =>  
'swiss);
my $n = 0;
while (my $seq = $seqio->next_seq) {
	$n++;
	# print something every 5,000 sequences or so
}

But maybe load_seqdatabase.pl or even BioSQL or BioPerl aren't  
suitable for your use-case?

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================






More information about the Bioperl-l mailing list