[Bioperl-l] reducing time
Heikki Lehvaslaiho
heikki at nildram.co.uk
Tue Jan 27 10:55:17 EST 2004
Pierre,
I could not find anything wrong in your code. 95 seconds per file (you mean
per sequence?) is really slow. The problem according my tests is your local
system. Maybe you have too many files per directory? Not enough RAM?
Maybe your subroutine is not optimal?
Sporadic filling of the output file is normal behavour of the unix file
system, so I do not think there is nothing wrong in that. It just shows that
IO and/or CPU are very buzy.
I tested your code by copying it into a file (rd.pl) and copying a half a
dozen fasta files from BIOPERLROOT/t/data (alnfile.fasta amino.fa dna1.fa
dna2.fa multi_1.fa multi_2.fa) into a subdirectory 'fasta' and ran it in my
laptop:
time perl rd.pl fasta
The output from the time command was:
0.43user 0.02system 0:00.57elapsed 80%CPU
(and time diff from code 0 sec.)
Yours,
-Heikki
###################################
use Bio::SeqIO;
opendir(DIR, shift ) || die "can't opendir : $!";
while (my $file = readdir DIR) {
next if ($file =~ /^\./);
print STDERR $file, "\n";
#next;
my $stream = Bio::SeqIO->new(-file =>"<fasta/$file",
-format => 'fasta');
while (my $one_seq = $stream->next_seq()) {
$inittime = time();
$heatcount = 0;
$nbhotspot = 0;
$aacount = 0;
$maxaacount = 0;
$nbaamotif = 0;
$aanb= 0;
my $seq = $one_seq->seq();
#/a for loop over each sequence goes here (a heavy one)/
$HSPercernt = 22;#$nbhotprot/$nbprot;
print "$file\t\t\t"."$nbhotprot\t".substr(($HSPercernt*100),0,4)."\n";
print OUT
"$file\t\t\t"."$nbhotprot\t\t\t".substr(($HSPercernt*100),0,4)."\n";
$nbprotanalysed ++;
$HSPercernt = 0;
$nbhotprot = 0;
$i++;
if ($i%10==5) {
my $endtime = time();
my $totaltime = $endtime-$inittime;
print "the procedure took ---->", $totaltime,"seconds\n";
}
}
} #End of reading directory while loop
###################################
On Tuesday 27 Jan 2004 13:12, KHOUEIRY pierre wrote:
> Hi all,
> In my script, I make a treatment on a big number of fasta files *(same
> size)*. The problem is (with my exstimations) the total time will be at
> about 26 hours . In fact it tooks 951 seconds for 10 files.
> Is there any way to reduce time waist... Or is there a probleme in my
> code. I add that the script goes faster for the first 20 or 30 files
> then it slow down but still fix (tested on a big number of files without
> samll loop on sequences). The output file still empty at the begining
> till a number of files is being studied and then it stills like that
> until another series of files is treated. I think it's a buffer problem...!
>
> _Her's a bit of the code i use:_
>
> while(my $file = readdir DH)
> {
> next if ($file =~ /^\./);
>
> my $stream = Bio::SeqIO->new(-file =>
> "</home/Perl/proteome/Output/$file", -format => 'fasta');
>
> while(my $one_seq = $stream->next_seq())
> {
>
> $heatcount = 0;
> $nbhotspot = 0;
> $aacount = 0;
> $maxaacount = 0;
> $nbaamotif = 0;
> $aanb= 0;
>
> my $seq = $one_seq->seq();
>
> /a for loop over each sequence goes here (a heavy one)/
> $HSPercernt = $nbhotprot/$nbprot;
>
> print "$file\t\t\t"."$nbhotprot\t".substr(($HSPercernt*100),0,4)."\n";
> print OUT
> "$file\t\t\t"."$nbhotprot\t\t\t".substr(($HSPercernt*100),0,4)."\n";
>
> $nbprotanalysed ++;
> $HSPercernt = 0;
> $nbhotprot = 0;
> $i++;
> if($i%10==5){
> my $endtime = time();
> my $totaltime = $endtime-$inittime;
> print "the procedure took ---->", $totaltime,"seconds\n";
> }
> }#End of reading directory while loop
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list