[Bioperl-l] Re: issues with _rearrange
Tim Bunce
Tim.Bunce@pobox.com
Fri, 20 Sep 2002 17:39:19 +0100
On Fri, Sep 20, 2002 at 04:19:42PM +0100, Ewan Birney wrote:
> On Fri, 20 Sep 2002, Aaron J Mackey wrote:
>
> <<snip>>
>
> Can I propose an
>
> (F) - Write a specific constructor, called like:
>
> $feature = Bio::SeqFeature::Generic->seqio_direct_new($start,$end,$tag_hash_ref);
>
> which is a "seqio optimised" constructor. The idea here is that SeqIO
> is allowed to "take a different route" from standard constructors and
> seqio_direct_new is allowed to do all manner of speed-magic - including
> filling in the hash directly outside of the get/set methods and - for
> example - directly using the tag_hash_ref built in the FTHelper object as
> the tag hash rather than adding things
>
> Ditto perhaps for PrimarySeq.
>
> Basically - can we write "direct_new" which are speed-optimised at the
> price of strict calling conventions and "insider knowledge"
>
> (caveat - how much stuff does Root::Object need intialised before it is
> happy, and if so... why...)
Seems like a good idea.
But meanwhile here's a version of _rearrange that's over 60% faster:
Benchmark: running new, old, each for at least 5 CPU seconds...
new: 6 wallclock secs ( 5.25 usr + 0.02 sys = 5.27 CPU) @ 17193.72/s (n=90670)
old: 6 wallclock secs ( 5.22 usr + 0.02 sys = 5.23 CPU) @ 11295.71/s (n=59126)
use strict;
use Benchmark;
use Data::Dumper;
my $order = [qw(SEQUENCE ID FOOBAR DESC)];
my (@new, @old);
timethese(-5, {
new => sub { @new=new_rearrange(undef,$order, -sequence=>41, -desc=>42, -id=>43) },
old => sub { @old=old_rearrange(undef,$order, -sequence=>41, -desc=>42, -id=>43) },
});
warn Dumper([\@old,\@new]);
sub new_rearrange {
my (undef, $order) = (shift,shift);
return @_ unless ($#_ % 2 && substr($_[0]||'',0,1) eq '-');
my %param;
while (@_) {
(my $key = shift) =~ tr/a-z\055/A-Z/d; # deletes leading (and all) dashes
$param{ $key } = shift;
}
return @param{@$order};
}
sub old_rearrange {
my($self,$order,@param) = @_;
return @param unless (defined($param[0]) && $param[0]=~/^-/o && ($#param % 2));
for (my $i=0;$i<@param;$i+=2) {
$param[$i]=~s/^\-//;
$param[$i]=~tr/a-z/A-Z/;
}
my(%param) = @param;
return @param{@{$order}};
}
I've not had time to work on tuning up other parts of the code (and
may not for at least several days) but from a quick read through
there's lots of scope for optimizations.
Tim.