[Bioperl-l] Performance problems with BioPerl and Perl 5.8 on Windows

David_Waner/San_Diego/Accelrys at scitegic.com David_Waner/San_Diego/Accelrys at scitegic.com
Thu May 18 19:30:46 UTC 2006


BioPerl Users/Developers,

In our testing we have found severe performance problems using BioPerl 
with Perl 5.8 on Windows (but not on Linux). They show up especially in 
SeqIO when reading or writing Fasta files containing large (~16 MB) 
sequences.  The same files that can be read in 1 or 2 seconds with Windows 
Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.

Although the fault is clearly with Perl, not with BioPerl, I have 
identified a couple of places where BioPerl could be modified in order to 
save Windows Perl 5.8 users a lot of time, while not affecting other 
users. 

For example, in my testing the following excerpt from 
Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a 
16 MB sequence):

    if( (!$param{-raw}) && (defined $line) ) {
        $line =~ s/\015?\012/\n/g;
        $line =~ s/\015/\n/g unless $ONMAC;
    }
 
whereas the following replacement code should be equivalent: 

    if( (!$param{-raw}) && (defined $line) ) {
        $line =~ s/\015\012/\012/g;                        # Change all 
CR/LF pairs to LF
        $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to 
NEWLINE
    }
 
but executes in less than 1 second.

In addition, changing:

    defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
 
to:

    defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove 
whitespace
 
in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.

There are also problems in reading files with the <> operator when $/ is 
redefined to "\n>", where reading the first line of Fasta files containing 
large sequences takes ~50 seconds, but reading subsequent lines or files 
takes about 1 second. I don't have a work-around for this.

I would like to ask the mailing list:

1. Has anyone else run into this problem? Any fixes?
2. Do you think BioPerl should incorporate these changes? 

I plan to submit a bug report to perlbug, but don't know when or if the 
problem will be fixed. 

- David




More information about the Bioperl-l mailing list