[Bioperl-l] Re: Automatic generation of set and get methods

Tim Bunce Tim.Bunce@pobox.com
Sat, 16 Nov 2002 21:45:21 +0000


On Fri, Nov 15, 2002 at 02:36:42PM -0800, Hilmar Lapp wrote:
> 
> > The first time a method is called there is a lookup to
> > find it. After that the result is cached. Unless someone
> > explicitly plays with that portion of the symbol table
> > further use of the method is no worse than a subroutine
> > call. If an AUTOLOAD installs the method and does a goto
> > then the method will be called at full speed from then on.

Remember that all methods must be pre-declared (using "sub foo;")
otherwise you can run into method lookup problems - if a super class
has a method that should be overridden in the subclass but hasn't been
declared then the AUTOLOAD in the subclass will _not_ be called.
That's because the method lookup searches for what's been declared
before falling back to trying AUTOLOADs.

Another problem is that once the AUTOLOAD for a subclass has been
called there's no way for it to say "sorry, can't load that one,
please keep trying elsewhere". But that's generally not an issue
if all the classes have pre-declared all their methods.

> > Part of Perl's flexability in handling these things is the
> > ability to modify the symbol table on the fly. Caching will
> > normally save most of this expense. Overuse of AUTLOAD with
> > no subroutine installed will be seriously expensive.
> 
> I'm a complete ignorant of how the engine runs I have to admit.
> I made the statement based on Tim Bunce's reviewing our parser and
> object initialization code and the result basically was that we had
> far too many method invocations (along with some other things).
> Once we reduced that the parser in question was ~5-6 times faster.

Methods don't cost much more than subs. Much of the overheads come
from putting the arguments onto the stack and getting them off again:

use Benchmark;
my $obj = bless {} => 'fooclass';
sub fooclass::empty { }
sub fooclass::std   { my ($self, $arg) = @_; return $arg; }
timethese(1000000, {
  s1 => sub { fooclass::empty($obj,1) },
  m1 => sub { $obj->empty(1) },
  s2 => sub { fooclass::std($obj,1) },
  m2 => sub { $obj->std(1) },
});

Benchmark: timing 1000000 iterations of m1, m2, s1, s2...
  s1:  2 wallclock secs (1.60 usr + 0.00 sys = 1.60 CPU) @ 624390.24/s (n=1000000)
  m1:  2 wallclock secs (2.17 usr + 0.00 sys = 2.17 CPU) @ 460431.65/s (n=1000000)
  s2:  4 wallclock secs (3.45 usr + 0.01 sys = 3.46 CPU) @ 288939.05/s (n=1000000)
  m2:  5 wallclock secs (4.58 usr + 0.00 sys = 4.58 CPU) @ 218430.03/s (n=1000000)

You could say s1 is much faster than m1, but as soon as you try to
do anything useful within the sub then the difference becomes far
less significant (s2 vs m2). And even the slowest is able to do
over 200,000 calls per second - so these differences, in themselves,
are rarely where the problems are.

I'd guess that you all have better things to do than spend much
time fussing about accessor methods. But if you want to know my
performance perspective it's this: given the slow down demonstrated
above by just two trivial statements in std() then for maximum
performance it should be obvious that separate set_* and get_*
methods are needed so that each can have the smallest amount of
code, which is probably:

	sub set_foo { shift->{foo} = shift }
	sub get_foo { shift->{foo} }

Tim [who was 'just passing' and isn't really paying any attention to this list]