[Bioperl-l] 65% speedup for Bio::SeqIO::fasta::next_seq test

Tim Bunce Tim.Bunce@pobox.com
Mon, 30 Sep 2002 20:35:56 +0100


On Mon, Sep 30, 2002 at 04:48:49PM +0100, Ewan Birney wrote:
> >
> > Tim.
> >
> > p.s. I probably won't have time to tinker with bioperl for some time now.
> 
> :(
> 
> Nice to have you visit. Come back anytime....

Feel free to play while I'm away...

One thing I've noticed is a tendency to call methods inside sorts.
Not good - the method cost is multiplied many times by the sort logic
which has to call the method many times:

I've appended some examples - just those that were easy to find with a grep:

$ find bioperl/core -name \*.pm | xargs grep -r 'sort.*->[_a-z].*}'  (slightly edited)
bioperl/core/Bio/Coordinate/Collection.pm:       sort { $a->in->start <=> $b->in->start } @{$self->{'_mappers'}};
bioperl/core/Bio/DB/GFF.pm:  my @exons = sort {$a->start <=> $b->start} $transcripts[0]->Exon;
bioperl/core/Bio/DB/GFF/Feature.pm:#  return sort {$a->start <=> $b->start} $self->sub_SeqFeature($func_name) if $func_name =~ /^[A-Z]/;
bioperl/core/Bio/DB/GFF/Feature.pm:    $subfeat->{$type} = [sort {$a->start<=>$b->start} @{$subfeat->{$type}}] if $strand > 0;
bioperl/core/Bio/DB/GFF/Feature.pm:    $subfeat->{$type} = [sort {$b->start<=>$a->start} @{$subfeat->{$type}}] if $strand < 0;
bioperl/core/Bio/Graphics/Glyph/cds.pm:  @parts = sort {$b->left <=> $a->left} @parts if $strand < 0;
bioperl/core/Bio/Graphics/Feature.pm:    $self->{segments} = [ sort {$a->start <=> $b->start } @segments ];
bioperl/core/Bio/Graphics/Glyph.pm:    @subglyphs = sort { $a->left  <=> $b->left }  $factory->make_glyph($level+1,@subfeatures);
bioperl/core/Bio/Graphics/Glyph.pm:       $sortfunc = eval 'sub { $a->left <=> $b->left }';
bioperl/core/Bio/Graphics/Glyph.pm:  my @parts = sort { $a->left <=> $b->left } $self->parts;
bioperl/core/Bio/Location/Split.pm:    foreach my $location ( sort { $a->start <=> $b->start }
bioperl/core/Bio/Location/SplitLocationI.pm:    foreach my $location ( sort { $a->start <=> $b->start }
bioperl/core/Bio/Root/Vector.pm:sub _sort_by_rank { my $aRank = $a->rank(); my $bRank = $b->rank(); $aRank <=> $bRank; }
bioperl/core/Bio/Root/Vector.pm:sub _sort_by_name { my $aName = $a->name(); my $bName = $b->name(); $aName cmp $bName; }
bioperl/core/Bio/SeqFeature/Gene/Transcript.pm:        @exons = sort { $a->start() <=> $b->start(); } @exons;
bioperl/core/Bio/SeqFeature/Gene/Transcript.pm:        @exons = sort { $b->start() <=> $a->start(); } @exons;
bioperl/core/Bio/SeqFeature/Gene/Transcript.pm: return sort {$b->start <=> $a->start} @list;
bioperl/core/Bio/SeqFeature/Gene/Transcript.pm: return sort {$a->start <=> $b->start} @list;
bioperl/core/Bio/Tools/HMMER/Results.pm:    @doms = sort { $b->bits <=> $a->bits } @doms;
bioperl/core/Bio/Tree/Node.pm:     return sort { $a->internal_id <=> $b->internal_id }

If any of those, or any others not picked up by the grep, are in
'hot' code where performance counts, and are dealing with non-tiny
lists, then they should be replaced with a "Schwartzian Transform":

	http://www.5sigma.com/perl/schwtr.html

Tim.