[Bioperl-l] Hilmar and Ewan debate SeqFeatures some more...
Hilmar Lapp
hlapp@gmx.net
Fri, 19 Jan 2001 11:13:57 -0800
Ewan Birney wrote:
>
> Ok. Hilmar and I are now probably into the "code aesthetics" part of this
> debate, which definitely is worth having but someone sometime has to make
> a decision.
>
> I suggest that we keep bashing this out on the list for a couple more days
> (please... other people... if you have a view, do chip in). If Hilmar and
> I are still disagreeing with aesthetics I would like to nominate Jason to
> tie-break on the way to go (is this ok with you Hilmar and Jason...?)
>
Jason, you're going to play the Supreme Court judge here (no
appeals possible) :-)
In fact, I'd like to hear more feedback from actual users of these
features. It seems that most people are happy if only those
special GenBank features no longer get completely lost.
However, there are people who do want to do meaningful stuff with
the coordinates. One of these is our group in Vienna (yes, we draw
features, and yes, that adds to my concern). The other I know of
is David with his GUI, which is why I put him on cc.
David, any strong or weak feelings about this issue from your
perspective?
The BioJava project came up, as far as I can recall, with a
Location class model separate from the Feature class. I put
Matthew and Thomas on the cc to ask for their experience with this
model, and what the feedback from the biojava community was so
far.
> We have two points of contention:
>
> (a) Explicit Location objects or not.
>
> Hilmar suggests an explicit location object
>
> SeqFeatureI has-a LocationI
>
> LocationI is sub classed for Split (join statements) and Fuzzies
>
> Benefits - (a) easy to mix and match implementations of locations to
> different feature objects, and (b) if mix and matching locations to
> features is common, more realisatic. Hilmar argues that is clearer as
> well.
>
> Against - more objects and infact the majority of seqfeatures are little
> more than the location, and two extra strings.
>
> For backwards compatibility, I think SeqFeatureI->start would *have* to be
> delegated to SeqFeatureI->location->start - otherwise too much code will
> break... (of course, this delegation could just be for a while as we move
> code and people over to using "proper" locations)
>
I agree completely here. I even think $feature->start() can stay
there forever.
> People might be interested that I originally argued for an explicit
> location object about 1 month ago. I don't now...
>
> I am suggesting that SeqFeatures do not have an explicit location object,
> but we subclass SeqFeatures into Split, Simple and Fuzzy, all inherieting
> >from a common SeqFeature interface
>
> Benefits - (a) less objects (b) only one place where the client gets the
> information and (c) more backwardly compatible.
>
I'd like to note here that 'less objects' is not a benefit by
itself, unless loading modules imposes a significant run-time
performance hit, which I think we agree it doesn't. Having less
objects I think does constitute a benefit if it removes redundant
definitions, and makes for a steeper learning curve of the API,
that is, if they're easier to use. This is the point I doubt here:
I think further inflating SeqFeatureI flattens the learning curve.
And I think Location (where) and Feature (what) are not redundant.
As for the backward compatibility, I think the only problem here
is the exception yes/no issue, isn't it? So, backward
compatibility does not argue against decoupling Location/Feature,
does it?
> Effectively my main argument is that there will always be a pretty clear
> cut relationship that "this type of SeqFeature" is always "this class of
> location" so the splitting of the location away from the SeqFeature is
> just suggesting a mix-and-match world which doesn't actually exist.
It does exist. It may not be the most frequent case, but it is a
use case for us. And probably for everyone who draws features.
> Simpler and stronger to go for the combined interface in my view.
>
> (b) ->start ->end throwing exceptions or not.
>
> Hilmar says that for at least Fuzzies and possibly Splits the client
> should figure out by rooting around the object how to map these more
> complex locations to a simple start,end. The interface should allow
> exceptions to be thrown on ->start/->end indicating that the client should
> be treating this seqfeature somehow differently...
>
> Basically we pass the buck to the client.
>
Right. And I said that's where it belongs.
> I say that the implementation objects have to provide a default mapping
> of whatever ->start and ->end are. This means that clients can live in
> this happy world of "I have well defined start/ends" if they so wish
> without writing extra code. Smart clients are encouraged to root around in
> the objects for their "real" interpretation of the fuzziness.
>
> There are three reasons why I favour this:
>
> (a) Clients for dumping/drawing/manipulation have to treat large
> numbers of sequence features as a pretty homogeneous mass. If we make
> seqfeatures less homogeneous then every client is going to have to figure
> out how to "homogenize" the seqfeatures - this will be different client to
> client although for the main case they just want a "default way" of
> handling them. We are encouraging a diversity of views when our clients
> really want us to solve the problems for them.
>
This can be solved easily. For FuzzyLocation we implement a
default way of computing valid start/end, which can be activated
(globally) by client code. (I hear you saying if we do it this way
it should be activated by default :-)
> (b) as 99% of features are nice, well behaved "hard features" many
> pieces of client code written with the bioperl libaries will just assumme
> ->start,->end do not throw exceptions. When this piece of code is used by
> another user with a fuzzy feature, there will be a rather deep exception
> thrown by bioperl through the client code. I think both the user and the
> client with some justification will blame bioperl for this, no matter how
> much we say "you should have read the documentation and written 3
> different subroutines to replace every time you go
>
> if( $one->start == $two->start )
>
> gets replaced by
>
> if( &my_exact_function($one,$two) ) {
>
> }
>
> ...
>
> sub my_exact_function {
>
> # one of many if statements...
>
> if( $one->isa('Bio::FuzzyFeatureI') &&
> $two->isa('Bio::SimpleFeatureI') {
> ...
>
> }
>
> }
>
This can be accomplished much simpler:
if($user_prefs{"fuzzyLocs"} eq "simplifyToWidest") {
$loc1 = $feat_one->location();
$range = new Bio::Range(-start => $loc1->min_start(),
-end => $loc1->max_end());
$feat_one->location($range);
# same for $feat_two follows
...
}
# carry on as if there were no fuzzy etc features
# and you're safe from exceptions
> (c) long experience with seqfeatures has made me claim that the
> following rules are generally just what people want:
>
> - simple features - easy
>
> - join statements - ignore leading and trailing '<' '>' and take the
> edge start/end points on the sequence you are looking at
>
> - fuzzy features - either skip or - if you have to draw/compare them,
> take start/end as the min hard location mentioned and the maximum hard
> location mentioned, irregardless of the internal grammar.
>
> I reckon bioperl will be better to implement the (c) method by default
> without preventing smart clients from making their own decisions.
>
Well, I think you can have a full model and still always provide
simple implementations satisfying most people's use cases (to be
activated by client code, or activated by default, I think that's
a matter of taste).
Hilmar
--
-----------------------------------------------------------------
Hilmar Lapp email: hlapp@gmx.net
GNF, San Diego, Ca. 92122 phone: +1 858 812 1757
-----------------------------------------------------------------