[Bioperl-l] reducing time

Heikki Lehvaslaiho heikki at nildram.co.uk
Tue Jan 27 10:55:17 EST 2004


Pierre,

I could not find anything wrong in your code. 95 seconds per file (you mean 
per sequence?) is really slow. The problem according my tests is your local 
system. Maybe you have too many files per directory? Not enough RAM?
Maybe your subroutine is not optimal?

Sporadic filling of the output file is normal behavour of the unix file 
system, so I do not think there is nothing wrong in that. It just shows that 
IO and/or CPU are very buzy.


I tested your code by copying it into a file (rd.pl) and copying a half a 
dozen fasta files from BIOPERLROOT/t/data (alnfile.fasta  amino.fa  dna1.fa  
dna2.fa  multi_1.fa  multi_2.fa) into a subdirectory 'fasta' and ran it in my 
laptop:
  time perl rd.pl fasta

The output from the time command was:
	0.43user 0.02system 0:00.57elapsed 80%CPU
(and time diff from code 0 sec.)

Yours,
	-Heikki



###################################
use Bio::SeqIO;

opendir(DIR, shift ) || die "can't opendir : $!";

while (my $file = readdir DIR) {
    next if ($file =~ /^\./);
    print STDERR $file, "\n";
    #next;
    my $stream = Bio::SeqIO->new(-file =>"<fasta/$file",
                                 -format => 'fasta');

    while (my $one_seq = $stream->next_seq()) {
        $inittime = time();
        $heatcount = 0;
        $nbhotspot = 0;
        $aacount = 0;
        $maxaacount = 0;
        $nbaamotif = 0;
        $aanb= 0;

        my $seq = $one_seq->seq();

        #/a for loop over each sequence goes here (a heavy one)/
        $HSPercernt = 22;#$nbhotprot/$nbprot;

        print "$file\t\t\t"."$nbhotprot\t".substr(($HSPercernt*100),0,4)."\n";
        print OUT 
            "$file\t\t\t"."$nbhotprot\t\t\t".substr(($HSPercernt*100),0,4)."\n";

        $nbprotanalysed ++;
        $HSPercernt = 0;
        $nbhotprot = 0;
        $i++;
        if ($i%10==5) {
            my $endtime = time();
            my $totaltime = $endtime-$inittime;
            print "the procedure took ---->", $totaltime,"seconds\n";
        }
    } 
}                          #End of reading directory while loop

###################################
On Tuesday 27 Jan 2004 13:12, KHOUEIRY pierre wrote:
> Hi all,
> In my script, I make a treatment on a big number of fasta files *(same
> size)*. The problem is (with my exstimations) the total time will be at
> about 26 hours . In fact it tooks 951 seconds for 10 files.
> Is there any way to reduce time waist... Or is there a probleme in my
> code. I add that the script goes faster for the first 20 or 30 files
> then it slow down but still fix (tested on a big number of files without
> samll loop on sequences). The output file still empty at the begining
> till a number of files is being studied and then it stills like that
> until another series of files is treated. I think it's a buffer problem...!
>
> _Her's a bit of the code i use:_
>
> while(my $file = readdir DH)
>   {
>    next if ($file =~ /^\./);
>
>    my $stream = Bio::SeqIO->new(-file =>
> "</home/Perl/proteome/Output/$file", -format => 'fasta');
>
>    while(my $one_seq = $stream->next_seq())
>    {
>
>      $heatcount = 0;
>      $nbhotspot = 0;
>      $aacount = 0;
>      $maxaacount = 0;
>      $nbaamotif = 0;
>      $aanb= 0;
>
>      my $seq = $one_seq->seq();
>
>               /a for loop over each sequence goes here (a heavy one)/
>     $HSPercernt = $nbhotprot/$nbprot;
>
>    print "$file\t\t\t"."$nbhotprot\t".substr(($HSPercernt*100),0,4)."\n";
>    print OUT
> "$file\t\t\t"."$nbhotprot\t\t\t".substr(($HSPercernt*100),0,4)."\n";
>
>    $nbprotanalysed ++;
>    $HSPercernt = 0;
>    $nbhotprot = 0;
>    $i++;
>    if($i%10==5){
>    my $endtime = time();
>    my $totaltime = $endtime-$inittime;
>    print "the procedure took ---->", $totaltime,"seconds\n";
>    }
>   }#End of reading directory while loop
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________


More information about the Bioperl-l mailing list