Reverse Complement utility, Bio::Alg, return value problem
Steve A. Chervitz
sac@genome.stanford.edu
Thu, 7 Aug 1997 15:43:20 -0700 (PDT)
SteveB wrote:
>
> > SteveC wrote:
> > Regarding the issue of methods that modify an existing object,
> > I would argue that such methods should be flagged with a "set" prefix so
> > it is absolutely clear what is being done:
> >
> > $myseq->set_revcom($beg,$end);
> >
> > would change the sequence object into its reverse complement. It could
> > also return the altered object, too.
> >
> > The advantages I see would be:
> >
> > 1) One method call replaces three; set_revcom() would call inplace() for
> > you.
> > 2) Objects are less likely to be inadvertantly altered (or not altered)
> > due to a missplaced or incorrect inplace() call. Requiring calls to
> > inplace(1) and inplace(0) forces the client to do the accounting and
> > thus can lead to a new class of bugs and maintenance headaches.
> >
> > A disadvantage would be having two methods (set_revcom() and revcom())
> > instead of one, which you would need to have for every accessor.
> > But this is more in line with OO design. The inplace() calls would still
> > be useful when performing complex, multi-step manipulations.
>
> Another disadvantage is extra typing for the potentially more common
> operations.
>
> However, again, I agree with you. I think that this is most likely to do
> the right thing in most cases.
>
> Question: do you propose that set_revcom() also return the object ($self),
> or should the set_* functions return the modified (or previous,
> unmodified) data? I can see arguments for all three options.
Ah, we now open a new can of worms!
A key motivation for deciding what to return concerns error handling. If
the set fails, it's a good idea to halt further processing of the object.
With this in mind, it might be safest for the set method to return true
or false, depending on the success of the operation. This way, the
following code will always work:
if($myseq->set_revcom($beg,$end)) {
analyze_seq($myseq->get_seq());
} else {
warn "Can't set reverse complement.\n";
}
It's dangerous to return the object since you will get the wrong result
if the set fails:
$myseq->set_revcom($beg,$end)->get_seq(); ## Returns original sequence
## if set_revcom() fails.
Consider this strategy as well:
$myseq->set_revcom($beg,$end);
if($myseq->valid()) {
analyze_seq($myseq->get_seq());
} else {
warn "Can't set reverse complement.\n";
}
In this scheme, the failed set_revcom(), in addition to returning
false, would invalidate the object so it couldn't be used in ANY
subsequent operation. This would be a way, for example, to prevent *and*
signal calls to set_revcom() on protein sequences. (If you're calling
revcom on protein sequences, something is way wrong with your code or
your data!).
This degree of error checking is more important as the operations get
more complex, such as when an object is responsible for creating a new
type of object ($gene->set_protein() or $protein->set_blast()). In my
objects, when a complex "set" fails, I actually generate and attach an
internal exception to the object which contains data about the error. The
process of generating this exception causes the set operation to return
false.
So I would favor having "set" functions (any function which can
modify the object's data) return a status indicator and (possibly) being
able to generate an exception that can invalidate the object. The issue
of how to deal with exceptions is a separate issue. I don't think I have
the best solution yet.
SteveC