[Dynamite] Object functional Telegraph

Sat, 27 May 2000 19:16:36 -0700 (PDT)

This "object functional" idea has crystallised from something very vague
into something rather more solid in my mind. I will try to explain...

Hand-wavingly, two of the kind of things that you can do to functions in a
functional programming language are:

  (1) bind some of the function input parameters to be equal to constant
       values, or even equal to other functions ("currying")

       -- an example of this is turning the function "F(A,B) = A+B"
          into the function "F(A) = A + 3" by fixing "B=3"

       -- a more fancy example would be to turn "F(A,B) = A+B"
          into "G(A) = 2*A" by fixing "B=A"

  (2) make new functions by combining or adapting existing ones

       -- e.g. combine "F(A,B) = A+B" with "H(F) = 3*F"
           to get "J(A,B) = 3*(A + B)"

       -- very advanced example: combine "F(A,B) = A+B" with
           "K(F(A,...)) = min_{A} F*F" to get "L(B) = -B"

Don't worry if you don't quite get the last example. Guy may have an edge
from having coded some Haskell ;-)

I think we should *imitate* this to a limited extent in our object model,
in a way that essentially makes it very easy for the user to say, for
example:

  use Telegraph::Lib::SmithWaterman;
  use Telegraph::Lib::Matrices::BLOSUM qw($BLOSUM62);

  my $sw = Telegraph::Lib::SmithWaterman->new ('query' => $query_seq,
					       'target' => $target_seq,
					       'gap_open' => -12,
					       'gap_extend' => -2,
					       'sub_mat' => $BLOSUM62);

where the freeform "'param' => value" construct represents a
_binding_ of flexible type (explained in greater detail below).

The $sw object is of class Telegraph::Lib::SmithWaterman which is a
subclass of Telegraph::DPFunction. Once the appropriate parameters of a
DPFunction are bound, the score for Viterbi alignment can be calculated:

  my $score = $sw->viterbi_score->eval;

The method "$sw->viterbi_score" returns a new "function" object of class
Telegraph::DPFunction::ViterbiScore. Calling "eval" on this function
object causes it to be evaluated, which transparently triggers the Viterbi
algorithm somewhere behind the scenes.

Binding of parameters to values is a cool, freeform way of expressing the
functional nature of what we're doing. The above constructor for "$sw"
could equally well have been expressed in the following way:

  my $sw = Telegraph::Lib::SmithWaterman->new;

  $sw->bind ('target' => $target_seq, 'query' => $query_seq);
  $sw->bind ('sub_mat' => $BLOSUM62);
  $sw->bind ('gap_open' => -12, 'gap_extend' => -2);

Essentially, bindable arguments are just the same as attributes in IDL,
but stored in a Perl hash, so the list of valid args can be dynamically
adjusted.

For example, the "bind" method takes a list of "param => binding" pairs,
and matches each param up to an internal checklist -- something like this:

###############################################################################

package Telegraph::FunctionObject;

...

sub bind {
  my ($self, %binding) = @_;
  my @allowed_args = $self->args;
  while (my ($arg, $val) = each %binding) {
    if (grep ($arg eq $_, @allowed_args)) {
      $self->{'_binding'}->{$arg} = $self->preprocess_arg_binding ($arg, $val);
    } else {
      croak "'$arg' is not a legal argument for $self -- try @allowed_args";
    }
  }
}

sub get_binding {
  my ($self, $arg) = @_;
  my @allowed_args = $self->args;
  unless (grep ($arg eq $_, @allowed_args)) {
    croak "'$arg' is not a legal argument for $self -- try @allowed_args";
  }
  return $self->{'_binding'}->{$arg};
}

###############################################################################

The "$self->preprocess_arg_binding" is a protected virtual method
implemented by subclasses of Telegraph::FunctionObject. Often they will
just return $arg, but occasionally they may want to do some parsing, e.g.
if we wrote

  $sw->bind ('gap_open' => 'gap_extend - 10');

then the call to "$self->preprocess_arg_binding" would interpret the
binding as an equation, not a constant, and act accordingly (by parsing
the equation into a Telegraph::Calculus::Function object).

However, this level of detail is transparent to the user. They just see
functions and argument bindings. Incidentally, the score for a given
traceback can be calculated as follows:

  my $traceback_path = $sw->viterbi_traceback->eval;  # get the Viterbi alignment
  $sw->bind ('path' => $traceback_path);              # tell the SW object what path we're interested in
  my $traceback_path_score = $sw->eval;               # calculate the path score
  $sw->bind ('path' => undef);                        # tidy up after ourselves

...or, more neatly:

  my $traceback_path = $sw->viterbi_traceback->eval;
  my $traceback_path_score = $sw->eval ('path' => $traceback_path);

(this second form is like a temporary or "local" binding of the 'path'
parameter -- slightly hacky if you think about it, but convenient)

In other words, the $sw object is in fact a path score calculator. We
construct the Viterbi algorithm from this object (formally, the Viterbi
algorithm is just a calculation method for the function that arises when
you apply the rule "max over paths" to the score function).

We don't treat everything as a function or a binding. For example, sets of
states or transitions are not well represented in this way. However, since
a binding is just a unordered "param=>value" hash, a lot of things can be
represented very nicely as bindings.

For example, a GeneWise-to-HMMER mapping is a binding. And a training 
algorithm is a function that has the following bindable input args:

  * 'function' => a Telegraph::DPFunction object reference
  * 'groups'   => a set of training parameters, probabilistically grouped

...and returns, as output, a binding of all the parameters in 'groups' to
their estimated values. (Don't worry about this just yet if it's hassle.
Just believe me, it's cool, and it means that users can write training
code that looks like this:

  $sw->bind ($optimal_scores = $sw->training->eval);

Told you it looked cool)

Anyway, that's the basics for now -- I think what we should do in June is
write test scripts that look like the above code. I'll write a preliminary
poster abstract in latex tomorrow morning.. Guy has volunteered to submit
it in PDF format (or whatever is necessary).

Ian