[Biojava-l] Re: introducing a location-less feature into BioJava

Matthew Pocock mrp@sanger.ac.uk
Wed, 10 May 2000 14:59:57 +0100


Hi.

I think the issue is over the distinction between an individual feature, and the
type of the feature. In the example fo the SET-domain feature, each sequence has
a unique feature (in location, and possibly in the feature's sequence). However,
they both convey the same information - that that particular region is an example
of a SET-domain.

The loose-binding way to represent this is to 'agree' an annotation key
'domain.type' with the value being a reference to the SET-domain object. This is
the ideal solution if the code is for use by a small group, and if you want to
avoid writing any volume of code, or if you are scripting-language people (perl,
tcl, python etc.) and like to do things this way. It would also play nicely with
simple sequence browsers that know how to peek into the annotation bundle.

The tight-binding way would be to add a DomainType field to a new interface
DomainFeature, and have this reference the SET-domain object. This would be the
correct approach for representing a well-known resource like interpro where the
concepts of domain-type are constant and well understood :-D particularly if it
formed the basis of a library that you distributed.

A stupidly tight-binding way would be to have a SETDomainFeature (and another
feature class for each type of domain), but this will soon become unworkable, and
I think that the domain-type is best represented as multiple (possibly
polymorphic) instances of some DomainType interface that is associated with a
DomainFeature, or under the 'domain.type' annotation key.

Your database of domains may chose to keep a reference to the sequence/feature
that represents this particular SET-domain (allowing quick look-up of all
SET-domain features in existance), but this is your choice. If this is done
inside the same Java process (not via corba or SQL lookups) then you need to be
careful that sequences can be garbage-collected, but this is no great shakes
using the soft-references in 1.2.

So - a feature is specific to a sequence and its location (like an object is
specific to the memory it owns and the member-variable values it has), where as
the domain in this example is accessory information to the feature that gives the
feature type (like a class).

Would this fit into your world view, or have I missed the point somewhere?

Matthew

hilmar.lapp@pharma.Novartis.com wrote:

> If, on the other hand, Feature would not contain a Location-property, i
> could create a Feature object for the SET-domain (in general). A
> distinct LocatedFeature object must then be constructed to link sequence
> A to the SET-domain Feature, giving the location of the domain in
> sequence A; and the same must be done for sequence B. But the so-created
> distinct LocatedFeature objects would explicitly share (hold a reference
> to) the same Feature object (for the SET-domain).
>
>      Right. While the present design is similar to bioperl, it is contrary to
>      how
>      I guess most people design a sequence database with features (n-n).
>      This makes it then possible to ask a database to return all sequences
>      having a specific feature, and where this feature is. So, similarly
>      you would like to be able to ask a specific Feature object for
>      returning all its assignments to FeatureHolders, instead of having
>      to iterate over all FeatureHolders and query each whether they contain
>      a Feature of a particular type.
>
>      The question is maybe whether or not FeatureHolders should own the
>      Features they hold, in the sense that the location of the Feature
>      refers to its holder. Or you look at Features as abstract types of
>      their own, in the sense that they can be 'attached' to an arbitrary
>      number of FeatureHolders, and each attachment has a location (and
>      maybe even more properties, like a score, and the attachment to
>      different strands does not necessarily make a different feature, so
>      the strand may also be an attachment property).
>
>           Hilmar
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l